ISSN 2158-5296

Analytical Approaches to World Musics

Li_Carter-Enyi_Aina

AAWM Journal 12/1 (2024)

Towards a Cross-Cultural Theory of Tone-Tune Mapping: A Comparative Study of Cantonese and Yorùbá[1]

 Edwin K. C. Li, Aaron Carter-Ényì, and David Àìná

Cantonese; Yorùbá; tone-tune mapping; comparative analysis

While recent research has investigated tone-tune mapping in diverse regions in Asia, Africa, and Meso-America, a cross-cultural understanding of tone-tune mapping remains limited due to such variables as the role of tone in language comprehension, sample size, and musical genre. This article aims to lay a collaborative groundwork for a cross-cultural theory of tone-tune mapping by comparing two well-studied tone languages: Cantonese and Yorùbá. Examining the text-setting practices in the two languages with a constraint-based approach, this article explores seven aspects of tone-tune mapping in Cantonese and Yorùbá, namely, tonology, genre, interval size, pitch reset at text phrase/prosodic boundaries, oblique settings and declination, contour tones, and tone-tune independence. Original music analyses are conducted to explore the musical-linguistic constraints in the intersection of these features in relation to lyric intelligibility and listener perception. The article concludes that while cross-cultural tone-tune mapping models may not fully capture the complexities across music-linguistic cultures, global preference rules do emerge from localized constraints.


Edwin K. C. Li
is Assistant Professor of Musicology at The Chinese University of Hong Kong.

Aaron Carter-Ényì is Director of the Africana Digital Ethnography Project (ADEPt) at Morehouse College in Atlanta, Georgia.

David O. Àìná is retired from Lagos State University, Nigeria, and is currently Director of Music at Epiphany Lutheran Church of Baltimore, Maryland.


Click for DOI, citation, and PDF version


In 2012, Murray Schellenberg conducted a thorough examination of published case studies across such languages as Hausa, Thai, Xhosa, Cantonese, and Zulu to address the question: “Does language determine music in tone languages?” The findings revealed a wide range of variations concerning the degree to which a musical melody is influenced by speech tones. Some case studies presented compelling evidence supporting the mapping of speech tones onto a melody, while others indicated little to no correlation between these two domains. Over the past decade, there has been a surge in studies exploring tone-tune mapping from diverse regions worldwide, including those in Asia (Cantonese by Li 2021, Vietnamese by Kirby & Ladd 2016, and Thai by Tanprasert & Rockwell 2021), Africa (Ìgbò by Carter-Ényì 2016, Tommo So by McPherson & Ryan 2018, Yorùbá by Carter-Ényì 2016, and Zulu by Pooley 2020), and Meso-America (Tù’un Ndá’vi, a Mixtecan language of Mexico, by Sleeper & Basurto 2022). Despite these recent studies, the answer to the question of the relationship between tone and tune remains elusive; the extent to which speech tone influences sung melody and vice versa varies considerably within and across cultures. Numerous factors come into play, potentially accounting for this observed variation. They include the role of tone in the language itself (how important is tone to language comprehension? Is tone equally important for all semantic units at various lengths?), sample size (how many samples are needed to establish that tone-tune mapping is happening?), and musical genre (does a particular musical genre privilege certain aesthetics and/or techniques of tone-tune mapping? How should a cross-cultural analysis be initiated if a musical genre and the theoretical language to understand it is exclusive to an ethno-linguistic culture?).

[2] In tackling these questions, however, scholars from such fields as anthropology, ethnomusicology, linguistics, music theory, and psychology have often worked in isolation from literature outside their respective discipline(s) and the language(s) they study. Even within the same field, there have been notable disagreements regarding the appropriate methodologies to employ (as exemplified by Leben’s 1983 response to Richards 1972). Consequently, comparing studies from the same culture, let alone across different cultures, has proven challenging, as highlighted by Schellenberg’s work in 2012. This article therefore aims to lay a collaborative groundwork for a cross-cultural theory of tone-tune mapping through corralling expertise on this topic. It compares two tone languages practiced in diverse ethno-linguistic regions, which are among the most well-studied exemplars of tone-tune mapping: Cantonese (a Sino-Tibetan language practiced in southeastern China, Hong Kong, and Macau) and Yorùbá (a Niger-Congo language practiced in southwestern Nigeria and the Benin Republic). Despite being geographically distant, both ethno-linguistic cultures possess lexical tones (distinctive pitch levels of a word syllable that determine lexical meaning) that have a high functional load in terms of lyric intelligibility; that is, accurate tone-tune mapping is essential to understanding lyrics in music.[2] This shared characteristic provides a fertile ground for a comparative analysis.

[3] Following Laura McPherson and Kevin M. Ryan (2018), we propose a constraint-based approach to developing a cross-cultural theory of tone-tune mapping. To this end, we analyze text-setting practices in Cantonese and Yorùbá, two tone languages with tones forming minimal pairs (pairs of words that differ in meaning by only one phonological element, such as a phoneme) and/or playing a role in shaping grammatical structures. Our aim is to explore the musical-linguistic constraints in the intersection of seven aspects of tone-tune mapping in relation to lyric intelligibility and listener perception: (1) tonology; (2) genre; (3) interval size; (4) pitch reset at prosodic boundaries; (5) oblique settings and declination; (6) contour tones; and (7) tone-tune independence. We argue that while cross-language tone-tune mapping models may not fully capture the complexities across musical-linguistic cultures, global preference rules do emerge from localized constraints (see Kirby 2021).

[4] More broadly, by presenting a comparative analysis of tone-tune mapping in Cantonese and Yorùbá, we offer a cross-cultural outlook on music theory that goes beyond the predominantly Eurocentric discourse in music analysis. European elements such as equal temperament diatonic scales and European-designed instruments, notably keyboards, have indeed influenced some of the musical traditions in Cantonese and Yorùbá cultures during colonization and Christian missionization. However, we underline that there exist musical-linguistic features that have evolved independently from those European elements and therefore warrant new analytical approaches. Comparing tone-tune relationships across such musical genres as Cantonese popular music (henceforth Cantopop) and both indigenous and Islamicized musics in Yorùbá, we examine how musicians, from Ian Chan from Hong Kong to Àṣàké from Nigeria, navigate the intricate dynamics of this cross-domain relationship and the music-theoretical potential therein.

A Constraint-Based Approach to Tone-Tune Mapping

[5] McPherson and Ryan (2018) have formalized a grammatical model for tone-tune mapping in terms of Optimality Theory (OT). This theory adopts a weighted constraint-based approach to the generation of languages that unites both the characterization of individual languages with generalized linguistic typologies (see McCarthy 2002). In linguistics-only applications of OT, input and output are linguistic, mapping from, for example, an underlying representation (i.e., the abstract concept of a word) onto a surface realization (i.e., spoken version of a word). Tone-tune mapping, on the other hand, crosses domains from a linguistic input to a musical output, or rather, an output that is both linguistic and musical. What linguistics-only OT and OT in tone-tune mapping share is that they both encompass a set of constraints ranked differently to account for diversity and to resolve the conflicts arising out of said constraints. Tone-tune mapping, at a basic level, includes a preference for matching linguistic and musical tone combinations or bigrams (a pair of consecutive units). An example would be a similar setting where the lexical contour and the melodic contour are identical. This cross-domain relationship, generally speaking, also includes a constraint against mismatching bigrams, which can involve a contrary setting where the lexical contour and the melodic contour are opposite.

[6] As D. Robert Ladd (forthcoming) points out, certain constraints appear to be consistent across well-studied languages and musical genres:

Like the artistic constraints involved in meter and rhyme, the prohibition on mismatched pitch direction is not absolute, but the empirical evidence that such a principle exists now comes from many unrelated tone languages, and the basic idea of avoiding such mismatches is now shared across most recent work on tone-melody matching.

[7] The relative strength of the functional load of tone plays a crucial role in the ranking of the constraints against tone-tune mismatches. In a language where tone is critical to comprehension, tone-tune mismatches would be highly constrained, outranking musical constraints that might lead to mismatches. In case of other languages where tone is present but has a relatively low functional load (e.g., Zulu), the constraint against tone-tune mismatches might be outranked by other musical constraints on melodic realization. In what follows, through examining seven aspects of tone-tune mapping in Cantonese and Yorùbá, we shall discuss the constraint-based approach to highlight, echoing Annett Schirmer et al. (2005, 9), “the significance of cross-language comparisons for modelling both the processes that all languages share and the processes that are language specific.”

Tone-Tune Mapping in Cantonese and Yorùbá
1a) Cantonese Tonology

[8] Tone-tune mapping has played a pivotal role in the aesthetics of lyric comprehension in Cantonese musics, notably in traditional opera (Yung 1989), popular music (Chow 2012; Li 2021; Lee 2023), and contemporary art music (Chan 2021). Instances of non-correspondence between speech tones and melody in Cantonese musics usually stem from religious and historical factors (not least in Cantonese Christian hymns) (Kan 2023) or individual artistic choices (Li 2021).

[9] There are nine tones in Cantonese, with six distinct pitch levels. Traditionally, they can be categorized into two tone-groups: level and contour. The level tone-group comprises only level tones (i.e., relatively steady-state tones), while the contour tone-group includes non-level tones (i.e., rising, falling, and entering tones). Ideally, rising tones exhibit an ascending pitch contour; falling tones a descending pitch contour; and entering tones end on an unreleased consonant (-p, -t, or -k) and do not exhibit any pitch contour. There is, however, some ambiguity about falling tones. Low level tones are sometimes given a descending pitch contour, and high falling tones are sometimes normalized as a steady pitch (see Chow 2012, 12). Modern tone letter systems usually categorize Cantonese speech tones into three categories: level tones, rising tones (or contour tones), and entering tones. Figure 1 shows one tone-letter system categorizing the nine tones in Cantonese. The tone letters indicate the relative pitch difference among tones (entering tones do not exhibit any pitch contours). The tone letters range from 1 to 5, with 1 representing the lowest pitch and 5 the highest. For example, 星 /sing55/ (star) is a level tone whose pitch stays at 5, while 醒 /sing25/ (awake) is a high rising tone that begins with an onset on 2 and rises to the final on 5.[3] 式 (type) is Romanized as /sik5/ instead of /sik55/, as it is an entering tone that ends on an unreleased consonant and does not exhibit any pitch contour. The final of a speech tone is also called tonal target.

Figure 1. A tone-letter characterization of Cantonese speech tones.

[10] Linguist Marjorie K. M. Chan (1987) has argued that the perception of tone-tune mapping in Cantonese concerns the transition of the final of successive speech tones; this transition is called tonal target transition (TTT), which can be described in terms of directionality, namely, ascending, descending, and level.[4] For example, the TTT from 勝/sing33/ to 醒 /sing25/ ascends from 3 to 5 because the final of /sing33/ is 3, and that of /sing25/ is 5.

[11] In the context of Cantopop, Wing-see Vincie Ho (2010) demonstrates Chan’s argument by comparing verses set to the same melody in Eason Chan’s “Fallen Flowers on Flowing Water” (落花流水) (2006), as reproduced in Figure 2.[5]Underlined text in the Figure shows that the two strophic texts sung to the same melody have different underlying tones but the same TTT. Two pairs of bigram are both set to a descending major second: high level tone 都 /dou55/ and low rising tone 有/jau23/, and high level tone 些 /se55/ and mid-level tone 既/gei33/. Both pairs sound natural to a native speaker in the musical context. They involve a descending TTT from 5 to 3 and are set to an optimal interval (we shall discuss the optimal intervals in Cantonese tone-tune mapping later). This understanding of tone-tune mapping, however, has been critiqued in recent studies. For one, Yin Hei (Jason) Lee (2023, 36–58), in examining a set of Cantopop corpus, challenges the end-point principle embedded in the theory of TTT. He argues that level tones and contour tones with the same TTT (e.g., /sing55/ and /sing25/) should not be heard as equivalents. Rather, he claims, the upward glide in contour tones may offer an additional perceptual cue for the listeners to understand musical lyrics, thus relaxing the text-setting constraints against contour tones.

Figure 2. An excerpt of Eason Chan’s “Fallen Flowers on Flowing Water” (2006) (based on Ho 2010, 47).

1b) Yorùbá Tonology

[12] The very first Yorùbá dictionary describes Yorùbá as a “very musical” language (Crowther 1852, 3). Samuel Ajayi Crowther, the author of the dictionary, insisted that marks be added to letters to indicate rises and falls of pitch in Yorùbá, decades before a Swiss missionary linguist, Johann Gottlieb Christaller, applied a similar method to Akan in Ghana. While Cantonese combines both level tones and contour tones to construct minimal pairs, Yorùbá is a level-tonelanguage in which the relative pitch height between consecutive tones in speech carries phonemic weight (Welmers 1973). The moving pitches in Yorùbá speech do not usually distinguish the meaning of words. The phonemic tone levels (or tonemes) include (´) High, ( ) (blank) Mid, and (`) Low. Table 1 presents a collection of bigrams derived from the Yorùbá disyllable /igba/ with various combinations of tone levels.

HomophoneTone LevelsGloss
igbáMid-Highcalabash (a dried gourd cup)
igbaMid-Midtwo hundred
ìgbáLow-Highgarden egg (eggplant)
igbàMid-Lowclimbing-rope for palm trees
ìgbàLow-Lowtime

Table 1. /igba/ homophone group in Yorùbá (based on Abraham 1962, 282–4).

[13] The literature on the relationship between language and music in Yorùbá culture dates back to a century ago. By the 1920s, many “native airs” had been composed by Yorùbá Christians because no existing hymn tune could “express the meaning of the words in a ‘tonic’ language” (Ransome-Kuti 1923, iii). In 1922, a Yorùbá Anglican clergy J. J. Ransome-Kuti travelled from his hometown of Abeokuta, Nigeria, to the Church Missionary Society Exhibition in London, England. While there, Ransome-Kuti sang and played the piano on a recording of forty-three original, later released by the Gramophone Company in 1925 (Delano 1942). These recordings were recently re-released as part of the large audio collection Over there! Sounds and Images of Black Europe (2013).[6] These “native airs” were gathered in the Yorùbá Anglican hymnal “Iwe Orin Mimo (Book of Holy Songs),” which remains in print and is used in Anglican churches in southwestern Nigeria a century later. Ransome-Kuti was the first composer to make an international mark with Yorùbá language music, but he was not the last. Western-educated Yorùbá composers have long published works articulating the nuances of setting the Yorùbá language to music, including Thomas King Ekundayo Phillips (1952), Akin Euba (2001), and Stephen Oluranti (2012). To this day, “correct” text-settings are taken very seriously as part of the rubric for assessing the quality of vocal and choral compositions. At the Forum for Inculturation of Liturgical Music, a biennial competition of university chapel choirs, for example, an incorrect setting can make one’s work less competitive (Carter-Ényì & Carter-Ényì 2019). “Iṣe Olúwa,” reproduced in Figure 3, is among the earliest known Christian songs in Yorùbá with a melody specifically written for the text. Not all of the melodic intervals match the direction implied by the tone level bigrams. Notably, there are two Mid-High bigrams (on “Iṣẹ́” and “Olú-”) that are set to repeated pitches. Such “oblique” settings shall be discussed extensively in the subsequent sections.

Figure 3. “Iṣe Olúwa” (Yorùbá Native Air).

1c) Comparison of Tonologies

[14] While Sino-Tibetan languages (e.g., Cantonese and Mandarin) and Niger-Congo languages (e.g., Yorùbá) are entirely distinct language families, tone plays an important role in many of the languages in both families. In fact, Yorùbá and Mandarin may represent the extremes of the level-tone and contour-tone spectrum respectively in terms of tone language typologies. Each one of the four speech tones in Mandarin has a unique trajectory (level, rising, falling-rising, and falling), so the pitch height between tone bigrams is less important for perception. But in level-tone languages like Yorùbá, comparing the pitch height between two adjacent tone-bearing units (often syllables, or more specifically, mora, an average short syllable) is critical.

[15] While every speech tone has a unique trajectory in Mandarin, that is not the case for all Sino-Tibetan languages, notably Cantonese, which includes both level tones and contour tones. Since Cantonese includes a broad inventory of tones, there is more potential for cross-cultural comparison of tone-setting between Yorùbá and Cantonese, than Yorùbá and Mandarin. Yorùbá includes some contour tones (particularly rising), but they do not form the basis for minimal pairs. That being said, Cantonese and Yorùbá overlap in terms of tone trajectories such as those of Mid-High and Mid-Mid.

[16] The comparison of languages with distinct typologies (like Mandarin and Yorùbá) requires a phonological theory that accommodates all types of tones. The same is true in the studying of a language which has both level and contour tones (like Cantonese). In Autosegmental Phonology, Goldsmith (1976) suggested independent tiers for phonemes and tonemes by introducing a linguistic analysis where tone sequences could have relatively independent timings from other tiers (instead of a one-to-one ratio between tones and tone-bearing units). Within this method, Cantonese contour tones (e.g., the high rising tone /25/) can be compared to Low-High in Yorùbá. While the tones of Yorùbá are phonologized as Low, Mid, and High, glides exist between tones.[7] In the standard Yorùbá orthography, gliding pitches are shown as discrete tones on repeated tone-bearing units (e.g., /a/ with a rising tone would be “àá”), with one tone per tone-bearing unit (Bamgbose 2000). A High tone after a Low tone is phonetically more like Low rising to High (Low-High), but these nuances of tone realization in speech are not represented in text-setting and singing in most contemporary musics.

2) Musical Genres

[17] In cultures like Cantonese and Yorùbá, while there is evidence that musicians are intentionally mapping speech tones onto sung melodies, the approach is not consistent across genres or even within genres. Both Cantonese and Yorùbá cultures possess what may be broadly categorized as traditional, popular, and contemporary art music. It is often the case that traditional (or indigenous) music is the least westernized and is the most similar to the spoken language. In these two cultures, Cantonese opera and Yorùbá Oríkì (praise chant) include chant-like modes that closely follow lexical tones. In Cantonese opera, for instance, there is a technique of “plain speech” (baak/nimbaak), in which the opera singer delivers the “music” in a way that is close to speaking vernacular Cantonese, not unlike recitative in European opera. Yet, as Bell Yung (1989) describes, baak differs from vernacular Cantonese in its literary style, exaggerated rhythmic patterns, and ultra-clear pitch contour articulations. Likewise, in Yorùbá Oríkì, the realization of tones is more crystalline than that in speech. Figure 4 shows a transcription by Adegbite (1978) of an Oríkì in which the bottom three staff spaces are used to represent the three tone levels of Yorùbá (from the lowest space, Low, Middle, and High). Glides between tone levels are represented by curved marks that cross staff lines.

Figure 4. Transcription of a Yorùbá Oríkì chant by Adegbite (1978, 131).

[18] In Cantonese baak and Yorùbá Oríkì, linguistic features and musical aesthetics are not in conflict, but fused. However, baak and Oríkì might be considered stylized speech or chant, and not music per se. They are certainly a far cry from the westernized popular music found in the respective locations.

[19] Music with intercultural elements, such as commercial popular music influenced by American pop or hip hop, tends to vary in terms of attention to speech tone. While some artists may carefully preserve speech tones in their melodic settings, others may largely ignore them. This is indeed the case with both contemporary Nigerian popular music (Afrobeats) and Cantopop. Commercial recording artists with international ambitions in terms of fandom, as in the case of MIRROR in Cantopop and Àṣàkẹ́ in Afrobeats, have aesthetic concerns beyond the local area which transcend or perhaps defy indigenous aesthetics. Like many westernized musics, there is a tendency in Cantopop and Afrobeats towards stable pitch singing on the diatonic scale and use of chord progressions. Automated pitch correction (autotune) may be used, which further conforms the natural speech to stable pitch and equal temperament.

3a) Interval Size in Cantonese

[20] While a native Cantonese speaker can intuitively discern whether a tone-tune mapping is natural or not, it is challenging for a non-native speaker to understand the mapping of a TTT onto a melodic interval. In examining a corpus of music literature in Cantopop, Man-ying Chow (2012) has drawn a table, reproduced as Table 2, that outlines the optimal melodic intervals for level and non-level TTT.[8] While the optimal melodic interval for a level TTT is a repeated pitch (unison), they can also be mapped onto either a falling or rising melody.

 

Table 2. Table of the optimal melodic intervals for level and non-level tonal target transitions (based on Chow 2012, 6; phonetic transcription added by the authors).

[21] Kai-Young Chan’s (2021) study nuances the understanding of intervallic constraints in Cantonese tone-tune mapping by contextualizing interval sizes against pitch/tonal contexts. For instance, he shows, through perception tests, that the intelligibility of the bigram 東京 (Tokyo) /dung55ging55/, which is best set to a repeated pitch according to Chow (2012), varies across contexts. The results indicate that in C major, when /dung55ging55/ is set to two consecutive C3s and F3s, only 50% and 75% of the respondents found the bigram intelligible respectively, while all respondents found it intelligible when it is set to two consecutive D3s, E3s, G3s, A3s, and B3s.[9] Chan underlines that optimal intelligible text-setting is a fluid concept in different intervallic and pitch/tonal contexts.

3b) Interval Size in Yorùbá

[22] In Yorùbá musics which have adapted to Western tonal harmony, speech tone bigrams (conceptually analogous to TTT in Cantonese) correspond to musical intervals. Ideally, both the direction and magnitude of the tone-level changes are represented in text-settings.

Figure 5. Semitone difference thresholds for Yorùbá tone bigrams (from Carter-Ényì & Carter-Ényì 2016).

[23] Figure 5 shows the tone bigram difference thresholds in Yorùbá. A step (one or two semitones) is somewhat ambiguous; it could represent a level bigram (LL, MM, HH) or a step down to an adjacent level (HM or ML). The tonal space is asymmetrical: while a step up is clearly a tone level change, a step down (of two semitones) may or may not suggest a state change in tone height. This is because a downward interval of two semitones lies right in the middle of the inflection point (between a repeated tone bigram as in MM, and a bigram that descends one tone level as in ML). Table 2 (on Cantonese intervals in the previous section) and Figure 5 (on Yorùbá intervals) are used for a comparison of interval sizes in Table 4.

4) Pitch Reset at Prosodic Boundaries

[24] Ho (2010) notes that even if the directionality of musical intervals contradicts the TTT, native Cantonese speakers may still find the tone-tune mapping natural in Cantopop. In Figure 6, the dotted line marks an ascending TTT 3-5 (from 角 /gok3/ to 都/dou55/), opposed by a descending minor second in the melody. Ho argues that this opposition is permissible because the constraints governing tone-tune mapping patterns can be lifted at the prosodic boundary (i.e., the end of one phonological phrase and the beginning of another). While a prosodic boundary often coincide with that of a melodic phrase, Ho underlines that this alignment is not a necessary condition for a reset of pitch height and break in the tonological contour.

Figure 6. Musical interval contradicting tonal target transition. From Leon Lai’s “I Love You So” (我這樣愛你) (1998) (based on Ho 2010, 57).

[25] This principle, where multiple prosodic phrases may be contained within a melodic phrase (in the Western tonal sense), is also true of popular musics sung in Yorùbá. Figure 3 (“Iṣe Oluwa”) supports the notion of pitch reset at prosodic boundaries and melodic boundaries, and certainly at aligned boundaries. The excerpt includes four prosodic phrases (two measures each), each with pitch reset. The tonological boundaries are made clear, even without marking the tone, by the diatonically transposed version of “kole baje o” in mm. 7–8. It is not a common phenomenon in popular music sung in Yorùbá that the musical phrase ends while the prosodic phrase continues.

5) Oblique Settings and Declination

[26] In writing on Cantopop, Ho (2010) categorizes two types of oblique settings: (1) a rising or falling TTT being mapped onto a level melody, and (2) a rising or falling melodic setting for a level TTT. Similarly, Ladd (forthcoming) refers to the following as the two types of oblique settings: “Either the musical sequence is unchanging and the tonal sequence rises or falls (oblique I), or the tonal sequence is unchanging and the accompanying melody rises or falls (oblique II).” Oblique I is extremely rare in Cantopop. This type of mapping may be constrained because a level melody flattens out the distinction of speech tones characterizing the tone language, which likely generates semantic confusion. Ladd also notes that oblique II bigrams, where the tone remains the same but the musical pitch is either rising or falling, are far more common in the Cantopop corpus than oblique I.

[27] If a level TTT is mapped onto a rising or a falling melody, it is optimal to be followed by a tone of a different tonal target. Figure 7 shows how a TTT of 5-5-5-5 is mapped onto a falling melody, punctuated by a mid-level /3/, whose role is to re-orient the listeners to an optimal tone-tune mapping between a 5-3 TTT and a falling major second.

Figure 7. Tonal target transition 5-5-5-5 on a falling melody in Stephanie Cheng’s “Traffic Light” (紅綠燈) (2006) (based on Ho 2010, 62).

[28] It is possible that a descending stepwise sequence of notes could represent a repeated sequence of tones in Yorùbá, much as it does in Figure 7 in Cantonese. A recent example of a largely “correct” (or at least intelligible) setting that includes both an oblique I and oblique II setting comes from the hit Afrobeats song “Buga (Lo Lo Lo)” by Kizz Daniel featuring Tekno (2022). Figure 8 is a transcription of an excerpt of the first verse. Instead of resolving the oblique II descending setting of “lọ́-wọ́” (High-High) to a text with a lower tone level (Mid or Low), the songwriter code-switches into English. Using words that do not form tone-distinguished minimal pairs and including words from other languages, is a technique further explored below. While the oblique I setting might be considered the most “incorrect” aspect of the setting, the tone change crosses a word boundary. In this context, intralexical tone is more important than interlexical tone (Carter-Ényì 2016).

Figure 8. An excerpt of Kizz Daniel ft. Tekno “Buga (Lo Lo Lo)” (22 June 2022). https://youtu.be/bLF90M96m2Q?feature=shared&t=17 (starting at timecode 00:17).

[29] While both Cantopop and Afrobeats may use oblique II settings, there seems to be a stronger constraint against repeated melodic notes in Cantonese than in Yorùbá. Repeated notes are regularly used in Yorùbá traditional music, and this feature has been sustained in westernized forms including church music (see Figure 3) and contemporary popular music (see Figure 8) as well. Thus, we encounter a notable difference between Cantonese and Yorùbá. A characteristic of Cantopop is a paucity of level melodies due to the constraints associated with oblique I settings. This feature, which may be linguistically or musically motivated, seems to be a global constraint against repeated notes, and not merely a case of avoiding repeated notes when the tone is moving. On the other hand, in Yorùbá, because repeated notes are musically permissible—or alternatively conceived, because there is not a high-ranking constraint against repeated notes—oblique I settings are much more likely to surface in a melodic setting in Yorùbá than in Cantonese where oblique I settings are in some way constrained.

[30] Within oblique II settings, it is conceivable that a melody could rise or fall while the underlying tone remains static. However, in our three examples (two Cantonese and one Yorùbá), they are all descending melodic settings of a repeated tone, not ascending ones. Both Carter-Ényì (2016) and Ladd (forthcoming) attribute this tendency in tone-based tunes to declination, which is a phrase level pitch trajectory that is known to occur in both speech and non-tone-based songs, and is linked to a diminishing air pressure in the course of a phrase. David Huron (1996) found that a melodic arch (characterized by an overall rising trajectory in the first half of a melodic phrase and a falling trajectory in the second half of a melodic phrase) is the most common type of phrase-level shape in the Essen Folk Song Collection (a collection of Western folk songs, and presumably not tone-based). Paralinguistic intonation of speech, in both tone and non-tone languages, exhibits declination (Ladd 2008). Thus, if speech in both tone and non-tone languages and music of non-tone languages all exhibit declination, it makes sense that singing in tone languages would also exhibit declination in westernized genres.

[31] In cases where the pitch range is being lowered in the course of the phrase (whether because of physiological or aesthetic concerns), even the oblique I setting in the first verse of “Buga” can become acceptable; the rising interlexical tone sequence of “yẹn lọ́-” (Mid-High) is set to a repeated note, because in the rapidly decaying pitch height of the phrase (which descends a minor seventh in less than a measure), C4 is phonologically higher on beat 4 than it is on beat 3 and a half. In Yorùbá, the declination of a phrase is further reflected in the asymmetry evident in the magnitude of rising versus falling intervals between tone levels. In Figure 8, we observe that a two-semitone rise is sufficient for moving up one tone level. However, three semitones are necessary to very clearly articulate—that is, without any doubt—the moving down of a single tone level. This magnitude asymmetry both reflects and supports phrase declination. Thus, “Buga (Lo Lo Lo)” illustrates the interaction of tone-tune mapping with oblique settings, magnitude asymmetry and declination, in addition to code-switching.

6) Contour Tones

[32] While the issue of contour tones has been addressed to some extent in Li (2021), it deserves further consideration in this cross-cultural analysis. First, we summarize the issue within Cantopop: even when tonal targets are correctly mapped to melody in Cantopop, native speakers can still perceive a perceptual mismatch if a minimal pair is formed by differences in terms of the onset of the tone. This phenomenon is not uncommon in Cantopop and should be taken seriously as a point of departure for new theoretical considerations, especially regarding level TTTs where the onset does not match the target. Ian Chan’s “Farewell, Sea of Tranquility” (再見寧靜海) (2023), reproduced in Figure 9, is an illustrative example. In this excerpt, bigram 取消 (cancel) /ceoi25siu55/ has a TTT of 5-5, mapping perfectly onto two D5s. Yet it is difficult for a native speaker to discern this word complex since a minimal pair, 吹簫 (playing the vertical bamboo flute) /ceoi55siu55/, matches the melodic setting of two level speech tones to a repeated melodic note. Although semantically, “playing the bamboo flute” does not make much sense, phonologically, this is what might be perceived. While /ceoi25siu55/ and /ceoi55siu55/ share the same TTT, the intelligibility might be undermined because the first syllable /ceoi25/ of the intended word is a rising tone, not a level tone. This suggests that, in examining tone-tune mapping in Cantopop, one should also consider the TTT and the onset articulation of speech tones when there are minimal pairs differentiated by an onset tone (Lee 2023).

Figure 9. Perceptual mismatch in optimal tone-tune mapping in Ian Chan’s “Farewell, Sea of Tranquility”     (再見寧靜海)(2023). https://youtu.be/zrhyrWjS73c?si=YmYOescO97uycVK2&t=170 (starting at timecode 2:50).

[33] Another example by Ian Chan corroborates this point. Figure 10 is a reproduction of an excerpt of the song “Distance” (2022). Like /ceoi25siu55/ in Figure 9, the bigram 感觸 /gam25zuk5/ has a TTT of 5-5. There are two reasons why this bigram is perceptually intelligible: first, the first two words 會感 /wui23gam25/ provide an ascending TTT and an ascending melody (E-flat to F) that prepares the ascending pitch contour for the rising tone /gam25/. When Chan sings /gam25/, he glides into the tone, albeit extremely subtly, from E-flat at the onset, to F. This pitch contour preparation does not occur in Figure 9. Second, 感觸 /gam25zuk5/ has no minimal pairs in which the first tone is a level tone and its TTT is also 5-5. In other words, /gam55zuk5/ does not exist.

Figure 10. Optimal tone-tune mapping in Ian Chan’s “Distance” (2022). https://youtu.be/AsKuZrVZkoo?si=RNzId6xEcmCd9kee&t=53 (starting at timecode 00:53).

[34] In westernized Yorùbá popular music, it is, like Chan’s (1987) observation of Cantonese, the tonal target that is typically mapped from speech to song, including Highlife, Jùjú, and Afrobeats. However, this is not the case in traditional (indigenous) music and Islamicized music, in which stable pitch singing is not the preferred aesthetic. “Ògún” (the Yorùbá god of iron) is pronounced L L-H in speech (with a glide between L and H on “-gún”) but would likely be simplified for Western musical notation as a rising fourth or fifth. While a glide between the notes would likely not be represented in notation, in practice, there might be an instance of “scooping” (a glide to the higher note). However, any scooping might be eliminated by autotune in contemporary Nigerian commercial music. Much like the tonic sol-fa method, keyboard instruments and Western musical notation introduced practices of equal temperament and stable pitch singing in Nigerian music during the colonial era (see Carter-Ényì 2018). Digital effects for automatic pitch correction (“autotune”) are technologically imposing equal temperament and stable pitch on today’s commercial recordings.

[35] Contemporary Yorùbá popular music often includes stark juxtapositions of the indigenous, contemporary, or even foreign musical practices. Sometimes these juxtapositions lead to what might best be termed as polytonality. However, instead of two different diatonic keys streaming simultaneously, the indigenous tonality of Yorùbá chant or song is contrasted with Western instruments and tonal harmony. An example of juxtaposed tonalities and styles is an excerpt from a compilation of Islamic Yorùbá music from 2022 (“Omije Ojumi”) in Figure 11, in which a syllabic setting is followed by a long melisma reflecting a combination of indigenous or Islamicized vocal practices, such as Qu’ranic chant.

Figure 11. Excerpt of “Omije Ojumi Latest Islamic Yoruba Music Video.”https://www.youtube.com/watch?v=oCUhsmeVAQs&t=292s (starting at timecode 4:53).

[36] The use of autotune software leaves discernible digital artifacts, such as tone transitions and altered pitch contours, in the resulting audio recordings. Therefore, even with the heavy use of digital effects, listeners know that glides were present in the raw, unprocessed performance. Aaron Carter-Ényì and David Àiná (2021) clarify that the order in which tracks are recorded leads to this phenomenon. In Yorùbá Fuji, producers start with the foundation of a virtual drum kit that acts as a click track; then the indigenous drums are recorded; then the lead and backup vocals and perhaps more drums (Fuji is very drum-heavy); and finally, Western instruments like keyboard and sax are layered. As a result, the singing and Western instruments have distinct tonalities. In the case of Yorùbá Gospel, Western instruments that commonly make up the “rhythm section” (drum set, keyboards, guitars, bass) are recorded or sequenced first. Yet, polytonality is still possible. In Tope Alabi’s “Mimọ Oluwa,” for example, a completely diatonic refrain is disrupted by a non-diatonic chanted Oríkì section.

7) Tone-Tune Independence

[37] Both Cantonese and Yorùbá musicians, while clearly attending to tone in many cases, also create moments of tone-tune independence. There are momentary lapses of the linguistic constraints (against tone-tune mismatch and contrary and even oblique settings) in which lyric intelligibility does not depend on tone. For instance, many Cantopop lyricists take advantage of words that do not form minimal pairs because they are less likely to create ambiguity. Those words, however, may not be set to an optimal TTT. One example, reproduced in Figure 12, is the last chorus of Hins Cheung’s “What Separates Us” (俏郎君) (2021), where 無法 /mou11faat3/—in which optimal tone-tune mapping is a minor third, a perfect fourth, or a minor sixth—is mapped onto a major ninth, immediately followed by a step down. While the tone-tune mapping is non-optimal, there is hardly another /moufaat/ that makes sense in that linguistic context. Thus although the bigram may sound bizarre to the ears of a native speaker, this somewhat exaggerated setting, to some extent, still preserves the intelligibility of the text.

[38] Another compositional method of liberating tone-tune constraints is cross-textuality (already demonstrated in Figure 8). Li (2021) has demonstrated the cross-textuality between Cantonese, English, and Mandarin in the two-man Cantonese pop group FAMA’s “No Boundaries in the Sea of Knowledge” (2007); cross-textuality between Nigerian languages is also common in Yorùbá music. Figure 13 shows the multilingual gospel hit “Igwe” by Midnight Crew, which includes lead vocals in Yorùbá and English over an Igbo language refrain. In Igbo, the homophone /igwe/ has multiple meanings differentiated by tone (see Ekwueme 1974). But in multilingual settings, such as urban churches, only one meaning is known: “king” usually referring to Jesus (Carter-Ényì 2016). Words commonly used in church music are typically understood based on the context—/igwe/ means king and not bicycle, /ike/ means strength and not buttocks—whether the pitch contour matches the tone bigram or not (see Carter-Enyi 2016 and Ofuani 2022).

Figure 12. Hins Cheung, “What Separates Us” (俏郎君) (2021). https://youtu.be/nK44fV_rGT8?si=U0C9IyPBXF2_tgh9&t=231(starting at timecode 3:52)

Figure 13. Igbo refrain of the Midnight Crew’s “Igwe” (2009).

[39] Ironically, while J. J. Ransome-Kuti was pivotal in the development of Yorùbá language church music in the 1920s, Ransome-Kuti’s grandson Fẹlá Aníkúlápó Kuti (née Ransome-Kuti, 1938 to 1997) played a pivotal role in introducing Pidgin English lyrics (as an alternative to Yorùbá language lyrics) into Nigerian popular music in the 1970s (Olaniyan 2001). Fẹlá was central to the fusion of Black American and West African musical aesthetics (and political activism), which he called Afrobeat, and one of the key characteristics was the use of Pidgin (or “broken”) English. While the sound of Nigerian popular music continues to evolve, particularly in terms of technology usage, the integration of street slang in Pidgin and Yorùbá (as well as other Nigerian languages) remains influential to this day.

[40] While there are major Afrobeats artists like Kizz Daniel that, through a complex navigation, create lyrics (which include Yorùbá) that are highly intelligible, there are also Afrobeats artists who are typically unconcerned with tone, except for certain words that the artist deems more important than others. Àṣàkẹ́’s music is an example. Àṣàkẹ́ is the stage name for Ahmed Olalade, which is also his mother’s name. When he was young, he was referred to as ọmọ Àṣàkẹ́ (child of Asake, his mother) and this was sometimes shortened (because “children can be funny”) to Àṣàkẹ́ (without the “omo” or child of). He adopted this as his stage name to both surprise people (that a man walked out on stage rather than a beautiful woman) and as a way of honoring his mother (Vanguard 2023).

[41] Figure 14, Àṣàkẹ́’s “Lonely At The Top” (2023), combines Pidgin English with Yorùbá. In this excerpt, the artist seems unconcerned with mapping speech tones onto the sung melody, with the exception of the artist’s own appellation. As an important noun that is established over and over (in the credits, throughout the song), this is the word that is among the least likely to be misheard. The carefully articulated tones of Àṣàké suggest a reverence for the artist’s mother, in addition to the self-aggrandizement of calling one’s own stage name, which is common in varieties of hip-hop.

Figure 14. Àṣàkẹ́, “Lonely At The Top” (2023). https://www.youtube.com/watch?v=bbVZo4Yw7pI&t=73s (starting at timecode 1:13)

8) Summary of Findings

[42] Table 3 provides a summary of the cross-cultural analysis of tone-tune mapping and text-setting practices in Cantonese and Yorùbá.

[43] Western musical instruments and styles (such as European Hymnody) have had major impacts on contemporary music, and perhaps unsurprisingly, composers and songwriters have developed methods to preserve intelligibility in cosmopolitan musical genres. It is based on this approach that both creators and researchers (sometimes in a dual role) have made recommendations about preserving intelligibility in westernized settings, as we see in Table 4 (based on Chow 2012 for Cantonese with similar information gathered from Carter-Ényì & Carter-Ényì 2016 for Yorùbá).

[44] While “hybrid” forms of music that combine indigenous languages with Western instruments and production styles are common in both Hong Kong and Lagos, there are new experimentations in both cultures that diverge from the most Western acculturated forms of text-setting. It is in these new styles, with new experimentations with micro-tonality (in Cantonese) and polytonality (in Yorùbá), that we see less concern with emulating Western melodic and harmonic practices and more with the expression of indigenous aesthetics. In the adoption of Western melodic aesthetics, we saw a leveling musical practice and representation of tone languages in songs. However, we now see a cosmopolitan individuality in both Cantopop and Yorùbá pop artists, who are taking much more license in the representation of their respective languages in songs.

FeatureCantoneseYorùbá
Functional load of toneVery highVery high
Level tonesPresentPresent
Contour tonesCommon and phonemicPresent but not as common, not usually phonemic
Contour tones simplified to a tonal target for diatonic singing on stable pitchesYes, on music that uses the Western diatonic scale, not in some traditional genres like Cantonese operaYes, on music that uses the Western diatonic scale, not in some traditional genres like Yorùbá Oríkì
Syllabic vs. melismatic settingsSyllabic settings are particularly common in westernized popular and Christian musicSyllabic settings are particularly common in westernized popular and Christianized music; Islamicized music is highly melismatic, suggesting that melismas do not disrupt intelligibility
Mapping to semitone intervalsChow (2012) provides ideal intervals for lyric intelligibility in Cantopop; see Chan (2021) for the degree of intelligibility of intervals for tone successionsSimilar ranges of intervals to those in Cantonese
Pitch reset at prosodic boundariesSome constraints governing tone-tune mapping patterns can be lifted at prosodic boundariesSome constraints governing tone-tune mapping patterns can be lifted at prosodic boundaries
Oblique settingsCantopop shows a preference for oblique II settings (a rising or falling setting for a level tonal target transition) over oblique I settings (a rising or falling tonal target transition mapped onto a level melody)Oblique I settings, where the musical melody remains the same while the tone changes, are more likely to be found in Yorùbá than in Cantonese due to the permissibility of repeated notes
Tone-tune independenceCantonese musicians often take advantage of words that do not form minimal pairs to avoid ambiguity; cross-textuality between Cantonese, English, and Mandarin is also commonYorùbá artists employ cross-textuality between Nigerian languages to expand lyrical possibilities

Table 3. Comparison of major features of tone-tune mapping in Cantonese and Yorùbá.

Tone realization in Cantonese (tonal target transition)Tone realization in Yorùbá (Low, Mid, High)
1-22/4 semitones----
1-33/5/8 semitonesLow-Mid2/3/4 semitones
1-57/9/12 semitonesLow-High5 or more semitones
2-31/3 semitones----
2-53/5/8 semitones----
3-52/4 semitonesMid-High2/3/4 semitones
Level (e.g., 2-2)0 semitonesLevel (e.g., Mid-Mid)–2/–1/0/1 semitones

Table 4. Mapping of ascending and level-tone bigrams to melodic intervals in Cantonese and Yorùbá (based on Chow 2012 and Carter-Ényì & Carter-Ényì 2016).

Towards a Cross-Cultural Theory of Tone-Tune Mapping

[45] In this article, we have compared two distinct languages that share a common feature: tone has a high functional load, and contrary settings can cause perceptual ambiguity. Our goal is to gain a cross-cultural understanding of listening experiences and aesthetics of tone-tune mapping. We find that an apophatic approach is germane because we cannot define what tone-tune setting is so much as what it is not. In this cross-cultural study, while we have not found productive prescriptions for tone realization across cultures or even across styles within a single culture, we suggest that a constraint-based approach seems to be most appropriate for a cross-cultural theory of tone-tune mapping. The most conspicuous constraint is to avoid “opposing” settings and, to a lesser extent, “non-opposing” settings. Assessing whether settings have satisfied this constraint or not is a focus of studies in the linguistics literature (Ladd and Kirby 2020). In our analyses of tone-tune mapping in Cantonese and Yorùbá, we have found that this constraint is the most important when a word is part of a tone-differentiated minimal pair and may potentially be confused with another word. In situations where there is no risk of word confusion, composers, songwriters, and vocal artists sometimes choose to distort the tone of a word (see Li 2021 and Carter-Ényì & Àiná 2021). This distortion can occur for various reasons, such as giving precedence to a musical element in a particular section of the composition, disregarding the precise setting of the text, or using wordplay where the distorted text becomes the central focus.

[46] Li (2021) proposes that the experience of listening to singing in a tone language like Cantonese is like navigating a complex object through an anamorphic lens. “Anamorphic listening,” as he calls it, requires a hermeneutic valuing of categories. Some listeners may follow a process of categorical molding (when does speech/melody occur?), while others focus on categorical distinction (this is speech, and that is melody). Whether listening to tone language singing involves categorical molding or categorical distinction deserves cross-cultural studies. Tone-tune mapping in Cantopop, Li suggests, is dependent on factors other than tones and tunes, including the meaning of words, tempo, the manner of singing/speaking, the sentential/phrasal placement, the formality of language, the linguistic practice and habits of individual listeners, and so forth. Listening to Cantopop, Li argues, consists in an apophatic process that empathizes with “the sonic worlds of speech and melody, hearing them as relational semiotic containers emerging from or breaking through a speech-melody complex.” According to Li, every listening experience varies because it depends on what kind and how much contextual information the listener takes into account in parsing the “speech-melody complex,” thereby disarticulating “habitual speech-melody relationships” in tone-tune mapping.

[47] While the concept of cross-domain mapping from speech to song as like an image recorded with an anamorphic lens is highly illustrative, we must acknowledge some nuances. A song in a tone language differs from an anamorphic lens in that a lens is a manufactured mechanical object that filters images, whereas speech and song are both produced by the human vocal tract. In other words, it is a variable anamorphic lens, not a fixed one. The variability is not only from artist to artist or from song to song; sometimes it is within songs (like Tope Alabi’s “Mimo Oluwa” mentioned earlier). If an artist uses a specific scale (like the major scale) or autotune (or a pitch-rhythm correction software like Melodyne) consistently, then the scale or the signal processing algorithms act very much like a fixed lens. However, we are finding that more artists are giving themselves room for tone realization, free from the limitations of autotune and equal temperament.

[48] Even among contemporary popular performers that more carefully represent the pitch contours of tone languages (like Cantopop artist Don Li, as examined in Li 2021, or Kizz Daniel), the sung realization of the tones is hardly ever completely true to the spoken realization of tones. It is almost always filtered, even distorted, to some extent. If listening to tones being sung is like viewing the distorted image produced by an anamorphic lens, listening to sung tones is a critical activity. The sound demands listener engagement to actively correct the distorted image while simultaneously appreciating the aesthetic of the setting. Mappings from speech to song are rarely without an anamorphic lens.

[49] The notion that tone and tune can be mismatched but still intelligible may be optimistic. While mismatched (contrary) settings may be mentally corrected in otherwise clear contexts, it makes sense that they may also contribute to an overall mishearing of texts, particularly when meanings are not so crystalline (as in popular musics). While the body of analysis of tone-tune mapping is growing, further research is needed on listener perceptions and lyric intelligibility in recorded music (similar to Condit-Schultz & Huron 2015). Carter-Ényì & Carter-Ényì (2016) present findings on participants’ word identification and interval size (magnitude) and Sunday Ofuani (2022) on participants’ word identification within pre-existing musical compositions with matched (linguistic tone determined) and mismatched (musical/composition determined) settings of specific words. While Ofuani’s study represents an important pursuit and clarifies that many potentially ambiguous words (like Ike which means strength or buttocks in Igbo) are now usually not confused, However, these results are to be expected; it is not only grammatical context but also social and musical context that clarifies meaning. For example, in a piece of Igbo choral music (which is usually Christian) Ike means strength but not buttocks. Correctly identifying music lyrics in popular musics may be more difficult than religious or traditional music. This is consistent with both Schellenberg’s work on tone languages (2012) and Condit-Schultz & Huron’s study of Western genres (2015).

[50] As we mentioned at the outset of this article, McPherson and Ryan (2018) discuss tone-tune mapping within the framework of Optimality Theory, a ranking of constraints on tone realization. A cross-cultural constraint, they find, is against tone-tune mismatches. We suggest that the ranking of this constraint depends on the functional load globally within a language, and locally, whether a word has tonemes that differentiate it from other words. Ladd (forthcoming) proposes that there are also global constraints on oblique I settings (a flat melodic setting of a moving tone sequence), which are also observed in Ho (2010). Other constraints may not only pertain to direction, but also to interval size. Authors addressing Cantonese, as well as those addressing Yorùbá, have independently suggested that it is necessary to articulate the difference between an adjacent tone level and a non-adjacent tone level, for example, /25/ and /23/ in Cantonese or Mid-High and Low-High in Yorùbá. Although tonemic contrasts that have the same direction but different magnitudes of pitch change are less common, they do exist. Nigerian composers (such as Phillips 1952 and Ekwueme 1974), for instance, have posited constraints against contrary directions in harmonic (simultaneous) singing voices, i.e., soprano and alto voices.

[51] One of the most promising directions for theorizing cross-cultural tone-tune mapping, particularly with a constraint-based approach, is to further explore the intersection of oblique II settings (Ho 2010; Ladd forthcoming), interval size, phrasal/prosodic boundary, magnitude asymmetry (Carter-Ényì & Carter-Ényì 2016), and declination (Carter-Ényì 2016; Ladd forthcoming). These features are related to the constraint against rising melodic intervals in Tommo So, as observed by McPherson & Ryan (2018). Similarly, in a cross-cultural study of melodies, Huron (1996) found that upward leaps are unlikely to occur late in a musical phrase. All of these observations are consistent with phrase declination, a cross-cultural and cross-domain phenomenon where pitch level is generally higher within the voice range at the beginning of a phrase than at the ending. Localized constraints that pertain to the beginning of musical phrases versus the end, or to words with or without tonemes, are needed to fully develop a model for any genre within a culture as well as across cultures. Overall, our qualitative study suggests that while it is unlikely that cross-language tone-tune mapping models can be operationalized without sacrificing local musical-linguistic nuances, global preference rules do emerge from localized constraints.


REFERENCES

“BLACK EUROPE: The Sounds And Images Of Black People In Europe – Pre 1927.” Accessed 9 October 2023. http://black-europe.com/.

  1. “Over There! Sounds and Images of Black Europe.” Accessed 8 October 2023. https://music.apple.com/us/album/over-there-sounds-and-images-of-black-europe/734596874.
  2. “Why I adopted my mom’s name as stage name – Asake.” Vanguard, accessed 19 December 2023. https://www.vanguardngr.com/2023/09/why-i-adopted-my-moms-name-as-stage-name-asake/.

Abraham, Roy C. 1962. Dictionary of Modern Yoruba. London: Hodder and Stoughton.

Adegbite, Ademola Thomas. 1978. “Oriki: A Study of Yoruba Musical and Social Perception.” PhD diss., University of Pittsburgh.

Bámgbóṣé, Ayọ̀. 2000. A Grammar of Yorùbá, vol. 5. Cambridge: Cambridge University Press.

Carter-Ényì, Aaron. 2016. “Contour Levels: An Abstraction of Pitch Space based on African Tone Systems.” PhD diss., The Ohio State University.

———. 2018. “Hooked on Sol-Fa: The do-re-mi Heuristic for Yorùbá Speech Tones.” Africa 88(2): 267–90.

———. 2021. “Tone Realization and Register Transformations in Nigerian Art Music: A Formal Analysis of Èkwúèmé and Olúrántí.” Perspectives of New Music 59(2): 31–79.

Carter-Ényì, Aaron, and David Àiná. 2021. “Tonal Counterpoint Revisited: From Yorùbá Pop to American Hip-Hop.” Analytical Approaches to World Music 9(2). https://journal.iftawm.org/previous/vol9no2/carterenyi-aina/

Carter-Ényì, Aaron, and Quintina Carter-Ényì. 2016. “Perception of Syntagmatic Tone Intervals.” In Tonal Aspects of Languages: 5th International Symposium, Buffalo, New York, 107–10.

———. 2020. “Melodic Language and Linguistic Melodies: Text Setting in Igbo.” SMT-V: The Society for Music Theory Videocast Journal 6(5). https://vimeo.com/448178213

Carter-Ényì, Quintina, and Aaron Carter-Ényì. 2019. “Thirteen Ways to ‘Hail, Mary’: A Case Study of the 2013 Forum for the Inculturation of Liturgical Music in Nigeria.” Yale Journal of Music & Religion 5(1): 35–50.

Chan, Kai-Young. 2021. “From Constraints to Creativity: Musical Inventions through Cantonese Contours in Hong Kong Contemporary Music.” Principles of Music Composing: Phenomenon of Creativity 21: 41–59.

Chan, Marjorie K. 1987. “Tone and Melody in Cantonese.” In Proceedings of the Thirteenth Annual Meeting of the Berkeley Linguistics Society, vol. 13, Berkeley, CA, 26–37.

Chow, Man-ying. 2012. “Singing the Right Tones of the Words: The Principles and Poetics of Tone-melody Mapping in Cantopop.” MPhil thesis, The University of Hong Kong.

Condit-Schultz, Nathaniel, and David Huron. 2015. “Catching the Lyrics: Intelligibility in Twelve Song Genres.” Music Perception: An Interdisciplinary Journal 32(5): 470–83.

Crowther, Samuel. 1852. A Grammar of the Yorùbá Language. London: Seeleys.

Delano, Isaac O. 1942. The Singing Minister of Nigeria: The Life of the Rev. Canon J. J. Ransome-Kuti. London: United Society for Christian Literature.

Ekwueme, Laz. E. N. 1974. “Linguistic Determinants of Some Igbo Musical Properties.” Journal of African Studies 1(3): 335–53.

Euba, Akin. 2001. “Text Setting in African Composition.” Research in African Literatures 32(2): 119–32.

Goldsmith, John A. 1976. “Autosegmental Phonology.” PhD diss., Massachusetts Institute of Technology.

Ho, Wing-see Vincie. 2010. “A Phonological Study of the Tone-melody Correspondence in Cantonese Pop Music.” PhD diss., The University of Hong Kong.

Huron, David. 1996. “The Melodic Arch in Western Folksongs.” Computing in Musicology 10: 3–23.

Kan, Joshua Ching Yuet. 2023. “Hong Kong Christian Songwriters’ Dilemma: Juggling Sacred Music, Tonal Language, and Christian Faith.” Analytical Approaches to World Music 11(1). https://journal.iftawm.org/previous/2023-volume-11-no-1/kan/

King, Robert D. 1967. “Functional Load and Sound Change.” Language 43(4): 831–52.

Kirby, James. 2021. “Towards a Comparative History of Tonal Text-setting Practices in Southeast Asia.” In Transcultural Music History: Global Participation and Regional Diversity in the Modern Age, edited by Reinhard Strohm, 291–312. Berlin: Berliner Wissenschafts-Verlag.

Kirby, James, and D. Robert Ladd. 2016. “Tone-melody Correspondence in Vietnamese Popular Song.” In Tonal Aspects of Language: 5th International Symposium, Buffalo, New York, 48–51.

Ladd, D. Robert. 2008. Intonational Phonology. Cambridge: Cambridge University Press.

———. Forthcoming. “Two Problems in Theories of Tone-melody Matching.” Studies in Prosodic Grammar, vol. 9.

Ladd, D. Robert, and James P. Kirby. 2020. “Tone-melody Matching in Tone Language Singing.” In The Oxford Handbook of Language Prosody, edited by Carlos Gussenhoven and Aoju Chen, 676–87. New York: Oxford University Press.

Leben, William R. 1983. “The Correspondence between Linguistic Tone and Musical Melody.” In Proceedings of the Ninth Annual Meeting of the Berkeley Linguistics Society, Berkeley, CA, 148–57.

Lee, Yin Hei (Jason). 2023. “A Corpus Study of Tone-Melody Correspondence in Cantopop, 2000–2020.” Master of Arts thesis, The University of British Columbia.

Li, Edwin K. C. 2021. “Cantopop and Speech-Melody Complex.” Music Theory Online 27(1).http://dx.doi.org/10.30535/mto.27.1.6

McCarthy, John J. 2002. A Thematic Guide to Optimality Theory. Cambridge: Cambridge University Press.

McPherson, Laura, and Kevin M. Ryan. 2018. “Tone-tune Association in Tommo So (Dogon) Folk Songs.” Language 94(1): 119–56.

Ofuani, Sunday. 2022. “Can the Intended Messages of Mismatched Lexical Tone in Igbo Music Be Understood? A Test for Listeners’ Perception of the Matched Versus Mismatched Compositions.” Music Perception 39(4): 371–85.

Olaniyan, Tejumola. 2001. “The Cosmopolitan Nativist: Fela Anikulapo-Kuti and the Antinomies of Postcolonial Modernity.” Research in African Literatures 32(2): 76–89.

Oluranti, Stephen Ayodamope. 2012. “Polyrhythm as an Integral Feature of African Pianism: Analysis of Piano Works by Akin Euba, Gyorgy Ligeti & Joshua Uzoigwe and Àjùlo Kìnìún (Original Composition).” PhD diss., University of Pittsburgh.

Phillips, Thomas King Ekundayo. 1952. Yorùbá Music (African): Fusion of Speech and Music. Roodepoort: African Music Society.

Pooley, Thomas M. 2020. “Linguistic Tone and Melody in the Singing of Sub-Saharan Africa.” In The Routledge Companion to Interdisciplinary Studies in Singing, volume I: Development, edited by Frank A. Russo, Beatriz Ilari, and Annabel J. Cohen, 108–20. New York: Routledge.

Ransome-Kuti, J. J. 1923. Iwe Orin Mimọ Fun Ijo Enia Ọlọrun Ni Ilẹ Yorùbá. Aberdeen, UK: Church of The Province of West Africa.

Richards, Paul. 1972. “A Quantitative Analysis of the Relationship between Language Tone and Melody in a Hausa Song.” African Language Studies 13: 137–61.

Schellenberg, Murray. 2012. “Does Language Determine Music in Tone Languages?” Ethnomusicology 56(2): 266–78.

Schirmer, Annett, Siu-Lam Tang, Trevor B. Penney, Thomas C. Gunter, and Hsuan-Chih Chen. 2005. “Brain Responses to Segmentally and Tonally Induced Semantic Violations in Cantonese.” Journal of Cognitive Neuroscience 17(1): 1–12.

Sleeper, Morgan, and Griselda Reyes Basurto. 2022. “Musicolinguistic documentation: Tone & Tune in Tlahuapa Tù’un Sàví Songs.” Language Documentation & Conservation 16: 168–208.

Tanprasert Teerapaun, and Joti Rockwell. 2021. “Sounding Thai: Instrumental Translation, Language-Melody Correlation, and Vocal Expressivity in Thai Sakon Music from the 2010s.” Analytical Approaches to World Music9(2): 1–33.

Welmers, William E. 1973. African Language Structures. Berkeley: University of California Press.

Yung, Bell. 1989. Cantonese Opera: Performance as Creative Process. Cambridge: Cambridge University Press.

[1]. We thank Samuel Hong-yu Leung for typesetting the Cantonese musical examples in this article, and the reviewers for their valuable suggestions.

[2]. In linguistics, functional load refers to “a measure of the work which two phonemes (or a distinctive feature) do in keeping utterances apart—in other words, a gauge of the frequency with which two phonemes contrast in all possible environments” (King 1967, 831).

[3]. For a discussion on Cantonese tones and systems of categorization, see Chow (2012, 8–23).

[4]. “Tonal target transition” (TTT) should be distinguished from “bigram.” The former specifically refers to the transition of the final of successive speech tones, while the latter a pair of consecutive units (e.g., syllables). For instance, we can refer to 故事 /gu33si22/ as a bigram, whose TTT is 3-2.

[5]. In this and all subsequent scored examples, the melody is transcribed in a Western staff notation, while the text with implied tones (as the text would be correctly spoken) appears as lyric tiers below the melodic transcription. None of the other musical examples include strophic texts (distinct texts sung to the same melody), so the underlining annotation only applies to this example.

[6]. Follow this link and scroll down to “Disc 3” to hear Ransome-Kuti’s “Native Airs” as recorded in 1922: https://music.apple.com/us/album/over-there-sounds-and-images-of-black-europe/734596874

[7]. This contrasts with languages within the same Niger-Congo family. For example, in Igbo there is far less glide between tones.

[8]. See Chan (2021) for tables on intelligible intervals for Cantonese tone successions based on perception tests conducted in 2020–21. The tests include intervals that are not commonly found in Cantopop, including minor seventh, major seventh, and major ninth.

[9]. The information on pitch heights and C3s is not included in Chan (2021). The authors obtained the information from personal communication with Chan on 8 March 2024.