ISSN 2158-5296

Analytical Approaches to World Musics

Dawson et al. 2026

AAWM Journal 14/No. 1 (2026)

Simple Artificial Neural Networks can use Cantometrics to Assign Folk Songs to Cultural Regions

Michael R.W. Dawson, Michael Frishkopf, Erron J. Meneses, and Kezziah C. Ayuno

keywords; keywords; keywords

Abstract Abstract Abstract Abstract

Michael R.W. Dawson is

Michael Frishkopf is

Erron J. Meneses is

Kezziah C. Ayuno is


Click for DOI, citation, PDF version


Introduction

We call brain-inspired computer simulations built from layers of interconnected processors artificial neural networks (anns). Modern anns, called deep neural networks (dnns) use an input unit layer to represent stimuli, and use an output unit layer to represent responses. Between its input and output layers, a dnn has many layers of intermediate processors called hidden units. Dnns convert stimuli into desired responses by using hidden unit layers to detect appropriate stimulus features (hinton, osindero, and teh 2006; lecun, bengio, and hinton 2015).

[2] DNNs can make very abstract responses to raw stimuli. Consider DNNs which classify music. Researchers encode such stimuli as raw sound patterns, like spectrograms. When presented such raw inputs, DNNs can classify musical genres, make musical recommendations, separate different musical sources, detect singing voices, recognize musical instruments, classify emotions conveyed by music, or transcribe musical sounds into musical notation (Briot 2021; Humphrey, Bello, and LeCun 2013; Moysis et al. 2023).

[3] How do DNNs accomplish such feats? Each layer of a DNN’s hidden units detects more abstract features. For example, a DNN which classifies faces might use early layers of hidden units to detect simple visual features but use later layers of hidden units to detect complex feature patterns (Mousavi et al. 2016).

[4] Researchers have used DNNs to perform tasks related to ethnomusicology. One example used a set of 1800 folk songs: 300 songs from each of six different cultural regions (Guo et al. 2018). Their DNN learned to assign a presented song, encoded as a spectrogram, to its cultural region. At the end of training, the DNN assigned songs to regions with 80.7% accuracy. Guo et al.’s DNN achieved excellent performance on a challenging ethnomusicological problem. However, can such a DNN inform or advance ethnomusicological theory?

[5] To use DNNs to advance theory, researchers must look inside networks to understand the complex features DNNs use to generate responses. However, the many-layered structure of DNNs makes interpreting a network’s internal properties too complicated (Montavon, Samek, and Muller 2018). DNNs suffer from what researchers call the ‘black box problem’ because researchers cannot explain how a DNN’s internal structure works (Adadi and Berrada 2018; Ghassemi, Oakden-Rayner, and Beam 2021; Zhang et al. 2018). Research on explainable artificial intelligence (XAI) attempts to solve the black box problem (Adadi and Berrada 2018; Ali et al. 2023; Angelov et al. 2021; Arrieta et al. 2020; Confalonieri et al. 2021; Deeks 2019; Minh et al. 2022; Tjoa and Guan 2021; de Bruijn, Warnier, and Janssen 2022). The goal of XAI is to interpret the structure of complex AI systems like DNNs and explain how they work to a user, or even to the general public. While reviews of XAI show it is making progress, it still faces many difficulties with the complexity of DNNs posing the most challenges.. In short, because of the complexity of DNNs, and the current state of XAI, researchers rarely explore the internal representations discovered by DNNs and therefore rarely use DNNs to advance or inform theory.

[6] How, then, might we use ANNs to inform theory? Recall DNNs succeed by discovering abstract features. What if we started by using more abstract features to represent stimuli, instead of using a raw format? If we provide relevant features before training begins, then simpler ANNs could learn to classify stimuli, because networks would no longer need multiple layers of hidden units to detect the features we already provide. In turn, we can interpret simpler networks, and use network structure to contribute to theory, including music theory (Dawson 2009, 2018; Dawson, Perez, and Sylvestre 2020; Perez et al. 2023).

[7] We use the current paper to illustrate such an approach. We return to the ethnomusicological problem studied by Guo et al. (2018): assigning folk songs to cultural regions. However, we do not represent songs as raw spectrograms. Instead, we represent songs as sets of features, using Cantometrics, a method developed by Alan Lomax to represent sonic properties of songs (Lomax 1959, 1962, 1967, 1968, 1977; Lomax et al. 1976; Savage 2018; Wood 2018b, 2018a). By representing each song with its sonic features, as defined by Cantometrics, we simplify assigning songs to cultural regions. In spite of Cantometrics’ long history, we can find no research showing Cantometric features can be used to assign songs to regions. We contribute to Cantometrics research by showing an ANN with no hidden units can learn to assign songs to regions with high accuracy using Cantometric features. We then show how we can use our simpler ANN’s structure to inform Cantometric theory.

[8] Our paper proceeds as follows: First, we provide a brief overview of Cantometrics. Second, we introduce a simple ANN, a modern version of the perceptron (Dawson 2008, 2022a). Third, we show how perceptrons can learn to use Cantometric features to assign songs to cultural regions with high accuracy. Fourth, we explore how using perceptron properties can inform Cantometrics. Finally, we consider future research questions motivated by the results we report below.

Cantometrics

[9] Ethnomusicologists often wish to use song properties to identify cultures or cultural regions (Freeman and Merriam 1956; Lena and Peterson 2008). Alan Lomax’s Cantometrics provides one example of such research (Lomax 1959, 1962, 1967, 1968, 1977; Lomax et al. 1976; Savage 2018; Wood 2018b, 2018a). Lomax developed Cantometrics to measure properties of song performances, as detailed below. Lomax related Cantometric scores of songs to properties of the songs’ cultures, as drawn from George P. Murdock’s Ethnographic atlas (Murdock 1967b, 1967a) and concluded “song style is an excellent indicator of cultural pattern” (Lomax, 1968, p. 3).

[10] Lomax developed Cantometrics as a scoring system which 1) made song performance properties explicit, 2) could be applied to recorded songs, 3) could be taught to different scorers and could be administered with high inter-rater reliability, and 4) could be related to cultural measures. Cantometrics is a scoring system which evaluates thirty-seven different sonic characteristics or features (Table 1). Each feature can be assigned with different values or states. Not all features are scored with the same number of states (see the differing numbers of checkmarks for each feature in Table 1). Some (e.g., Feature 15: Melodic Shape) are scored with only four states, while others (e.g., Feature 16: Melodic Form) are scored with thirteen states.

Table 1. The 37 features used for the Cantometric scoring of a song. Each feature can be assigned one of a finite number of states, but the number of states varies from feature to feature. The checkmarks in the table below indicate how many different states can be assigned to each feature.

[11] In Table 1, each state represents a possible value for a particular feature. Ordinarily one song would be assigned only one state for each feature. However, sometimes a song would be scored as having more than one state for a feature, when, for instance, one state was true early in a song, but a different state was true later in a song. Table 2 provides possible states for three example Cantometrics features.

Table 2. State values for three examples of the 37 Cantometric features.

 

[12] Lomax organized the states for each Cantometric feature in a Likert-like scale (Savage 2018); some scales are ordinal in nature (e.g., F1 in Table 2) while others are nominal (e.g. F15 in Table 2). In general, the ordering of feature states reflects an individual-to-integrated continuum. Lower state values represent properties consistent with an individualized song “in which a solo singer commands the communication space by presenting a pattern that is too complex for participation” (Lomax 1968, 16). In contrast, higher state values represent properties consistent with a song “in which all those present can join in easily because of the relative simplicity and repetitiousness of the patterns” (Lomax 1968, 16). Lomax believed song structure, and its relationship to cultural organization, was best understood using a dichotomy of individual vs. integrated. He therefore built this dichotomy directly into his scoring scheme.

[13] Individual songs were scored on a sheet; each row on the sheet corresponded to a Cantometric feature, and each column represented a feature’s state. A song was scored by circling states for each feature to indicate a state’s presence in the song. Lines were then added to link circled states together, producing a song profile (see examples in Lomax 1968, 24–27). Lomax used the similarity between song profiles to group similar songs together (Lomax 1968; Lomax et al., 1976). At first, song groupings were based on visual inspections of song profiles; later, factor analysis was used to objectively group similar songs. Lomax also used factor analysis to analyze similarities between songs after songs had been assigned a culture of origin, with over 400 different cultures used in his analysis (Lomax 1976, Chapter 2). Similar song styles from different cultures could then be grouped together. This permitted Lomax to relate his song style classifications to Murdock’s (1967a, 1967b) ethnographic atlas, revealing ten major cultural regions of song types.

[14] In this paper, we use Cantometric features to provide an initial representation of stimuli –songs – rather than the raw spectrograms used by Guo et al. (2018). We train a simple ANN, the perceptron, to use Cantometric features to assign songs to cultural regions. We find Cantometric features provide enough information for a simple network to accurately assign songs to cultural regions. To our knowledge this is the first demonstration that a computational method can use Cantometric features to predict regions. The next section introduces a perceptron’s basic properties.

The Perceptron

[15] Modern ANNs use intermediate layers of hidden units. However, early ANNs were much simpler. One famous network, the perceptron, had no hidden units at all (Rosenblatt 1958, 1962). In a typical perceptron, input units were directly connected to an output unit which generated a binary response for classifying stimuli. Rosenblatt discovered a procedure to train a perceptron to classify stimuli which used response errors to alter the network’s connection weights. However, perceptron popularity waned because, without hidden units, perceptrons were unable to learn complex classification tasks (Minsky and Papert 1969).

[16] Modern perceptrons (Dawson 2004, 2008) replace binary responses with continuous responses, often use multiple output units, and learn via gradient descent techniques related to the rules used to train modern ANNs with hidden units. Modern perceptrons provide insights into many phenomena including associative learning, probability learning, navigation, and musical cognition (Dawson 2008, 2018, 2022a; Dawson and Dupuis 2012; Dawson et al. 2009; Dawson and Gupta 2017; Dawson and Zielinski 2018). The ANNs we discuss below are all modern perceptrons like the one illustrated in Figure 1.

Figure 1. A perceptron trained to use Cantometric states to assign folk songs to cultural regions. The input units indicate whether a feature state is present. If the feature represented by an input unit is present, the unit is activated with a value of 1; otherwise, the unit is activated with 0. Input unit activities are sent through modifiable connections; an activity is multiplied by the current weight associated with a connection. The output units indicate regions. Each output unit sums the weighted activities sent to it by input units and converts the sum to an activity between 0 and 1 using the logistic equation. Output unit activity is compared to desired activity in order to calculate error, which is used to update the perceptron’s connection weights. The figure illustrates sets of input units used to represent the three Cantometric features detailed in Table 2. Additional input units exist but we omit them from the figure. The perceptrons we studied learned to assign stimulus songs to one of six different cultural regions (R1, R2, etc.). See text for details.

[17] A perceptron’s input units represent stimuli. In general, input units represent stimuli by being activated; activation is a numerical value indicating the presence or possibly the magnitude of a stimulus feature. When activated, input units send signals through weighted connections to output units. A signal is an input unit’s activity multiplied by the weight of the connection through which the signal is sent. An output unit processes signals as follows: First, it sums signals coming from different input units to determine the total signal, called the net input. Second, it converts net input into activity with a nonlinear activation function. Output units use a sigmoid-shaped activation defined by the logistic function provided in Equation 1: a = 1/(1 + e(-net + θ))

[18] In Equation 1, a is activity, net is the unit’s net input and θ is the unit’s bias, analogous to a threshold. The logistic function ‘squashes’ net input into a range between 0 and 1.

[19] An output unit’s activity represents the unit’s response to input unit activities. In Figure 1, each input unit represents a state of a Cantometric feature. We used 222 input units to encode all possible states of the thirty-seven different Cantometric features. Each output unit represents a cultural region to which a song could be assigned, with six output units representing six different cultural regions. We train the Figure 1 network to turn one output unit on (the unit corresponding to the song’s region) and the other output units off when a song’s Cantometric features are presented to the perceptron.

[20] How does a perceptron acquire the ability to convert features into regions? The connections in the Figure 1 perceptron begin as small, random values that are then modified through training. Training involves teaching the network with a training set, a collection of stimulus-response pairs. While learning, the network receives a stimulus, causing the output units to activate. Error is computed by taking the difference between observed responses and the desired responses associated with the stimulus in the training set. Error is then used to modify the network’s connection weights to decrease network error when the ANN receives stimuli.

[21] More formally, we use a gradient descent rule to train the perceptrons (e.g., Dawson, 2004, 2008, 2022). The gradient descent rule trains by changing the connection weight between an input unit and an output unit. The learning rule is provided in Equation 2:

 Δwij = η· ai · f’(netj) · δj                                           

[22] In Equation 2, Δwij is the amount by which learning changes the connection weight between input unit i and output unit j. η is a learning rate, a fractional value which scales how much learning can occur at any given moment. ai is the activity of input unit i.  f’(netj) is the first derivative of output unit j’s activation function; for the logistic equation, the first derivative is aj · (1 – aj), where aj is the activity of output unit j. Finally, δj is the error for output unit j; error is calculated by subtracting aj from the desired activity for output unit j.

[23] Repeatedly presenting the training set, and repeatedly modifying connection weights using Equation 2, improves perceptron performance by reducing output unit errors. If a perceptron can represent a solution to a problem, error is guaranteed to reach a global minimum of zero as proven in the perceptron convergence theorem (Rosenblatt 1962). If a perceptron is unable to represent a solution to the problem, then error will be reduced to a local minimum, meaning the network will generate errors to some stimuli, but further training will not drive network error to zero. We now report the results of training perceptrons to use Cantometric features to assign songs to cultural regions.

Method
Research Goals

[24] Perceptrons belong to a large class of machine learning methods, which use information from stimuli to adjust internal patterns to find regularities in a set of training patterns (Bishop 2006). The goals of a machine learning project determine its methodology. For example, Bishop notes machine learning can be used to solve applied prediction problems in which regularities learned during training are used to classify new stimuli never presented during training. For such applied prediction problems, it is customary to train an algorithm on a subset of available stimuli (the training sets) and then to measure how well the program’s responses generalize to different stimuli (the test set or the validation set) to determine how well learning generalizes to new instances.

[25] Other goals lead to different methodologies. For example, a machine learning algorithm can be used to reveal the structure of data without aiming to generalize that structure to new instances, making the algorithm more similar in spirit to statistical techniques like factor analysis (Gorsuch 1983). With this goal, the methodology involves using a training set to teach the algorithm, but does not test the algorithm’s performance on a validation set. Instead, after training, we examine what regularities the algorithm has discovered in order to make judgments about stimulus properties. Such properties, when discovered, are informative in their own right. For example, we have interpreted the internal structure of many different networks trained to make musical judgments (Dawson, Perez, and Sylvestre 2020; Perez et al. 2023; Dawson 2018). Because we were interested in the structure discovered by our networks – the regularities in a specific dataset – the networks were never tested on a validation set. Instead, we used the discovered regularities to inform musical theory as detailed below.

[26] The research reported below attempted to determine whether a simple network could capture regularities in a particular dataset. The research was not designed to produce a device to predict cultural regions for any folk songs, and therefore did not use different data sets (training and validation). Instead, it aimed to determine whether Cantometric features could even be used to predict regions for a particular set of songs studied by other researchers. We report below our simple networks provided excellent prediction. We also report how the structure of our trained networks could then be used to inform Cantometric theory by modifying Cantometric scores to make them more discriminating. The success of our machine learning project raises a number of questions to be addressed in future research which we describe later in the paper. One involves using Cantometric features to solve an applied prediction problem (i.e., predicting cultural region for any folk song) which would require a different methodology, one which used both training and validation sets.

Task and Stimuli

[27] We trained perceptrons to use Cantometric features to classify songs into one of six different cultural regions. We began with a set of 1,800 songs belonging to the Cantometric database now maintained at the Global Jukebox (https://theglobaljukebox.org) (Wood et al. 2022). The 1,800 songs we started with were the same as those used by Guo et al. (2018) because our goal was to compare the performance of our perceptrons to their DNN. We then examined the Cantometric features assigned to each song, looking for songs’ assigned state values which did not correspond to possible values provided in the Cantometric scoring handbook (Lomax et al., 1976). When such an anomaly (presumably due to database error) was discovered, we removed the song from our training set. As a result, our final training set consisted of 1,603 different songs because 197 songs were found to have anomalous Cantometric scores. We provide the number of songs in our training set for each cultural region in Table 3.

Table 3. The number of songs for each region in the training set.

[28] We note when we began our study Cantometric features of songs were not publicly available. Dr. Anna Lomax Wood, president of the Association for Cultural Equity, provided us the Cantometrics data used to create our training set. Now, however, a full dataset of 5,776 songs with Cantometric scores is publicly available from https://github.com/theglobaljukebox (Wood et al. 2022). Furthermore, prior to the public release of this database the Cantometric features of the songs were cleaned and curated, removing some of the problems we encountered when we removed songs with impossible features from our training set. We will use the new dataset for future research on networks and Cantometrics.

 

Perceptrons

 

[29] Each perceptron used six different output units, one for each cultural region in Table 3 (Figure 1). Because perceptrons have no hidden units, the connections between one output unit and the input units are independent of those between another output unit an and the same input units (Dawson 2005). In other words, training a perceptron with six output units is equivalent to training six different perceptrons (each detecting a different region) on the same input patterns. Each perceptron used 222 different input units. Each input unit corresponded to a particular state for a particular Cantometric feature (see Table 2 and Figure 1). For a stimulus song, an input unit was assigned a value of 1 if the state represented by the unit was used in the Cantometric scoring of the song; it was assigned a value of 0 otherwise. With such coding, we could represent situations in which more than one state of a feature was coded for a song because we could turn on more than one input unit associated with the same Cantometric feature.

 

Training

 

[30] We trained ten different perceptrons to classify songs. Each perceptron began with connection weights being randomly assigned values ranging from -0.1 to 0.1, and with output unit bias (the value of θ in the logistic equation) initialized to 0. Because each perceptron began from a different starting configuration, we considered each to be a different ‘participant’.

[31] We trained multiple perceptrons to determine whether different networks could achieve similar performance at the end of training, and to be able to calculate properties of an ‘average’ perceptron trained on our classification task. We show below our perceptrons assign songs to regions with high accuracy. Such performance permits us to use perceptron structure (i.e., connection weights) to modify a statistical tool, called a summodal profile, used in Cantometrics to summarize the properties of related songs (Lomax 1968, Appendix 3). Instead of using one perceptron, we average connection weights across our perceptrons and use average weights to modify summodal profiles. We describe modification of summodal profiles in more detail later in the manuscript.

[32] Training proceeded using a gradient descent learning rule developed for perceptrons whose output units employ the logistic activation function (Equation 1) (Dawson, 2004, 2008, 2022). Each network was trained as follows: First, a stimulus was presented by activating the input units, which caused the perceptron’s output units to respond. Error was computed for each output unit by taking the difference between an output unit’s desired activity and an output unit’s actual activity. We then used error to modify the output unit’s bias and connection weights according to the gradient descent rule. This procedure was repeated for the next stimulus. Thus, connection weights were modified after each stimulus presentation. Training proceeded in an epoch-by-epoch fashion, where one epoch involves a single presentation of each song. We randomized the order of stimulus presentation at the start of every epoch.

[33] Pilot simulations determined the values for our training parameters. We sought values which would reliably lead to excellent perceptron performance. The pilot simulations led us to choose a learning rate of 0.1. The pilot simulations also indicated a perceptron would never generate correct responses to every song. This was not surprising because perceptrons, lacking hidden units, are limited in what they can learn (Minsky and Papert 1969). The pilot simulations suggested we train each perceptron for 5,000 epochs because after this amount of training perceptron performance was at a maximum.

 

Results
Accuracy

 

[34] When we train networks we hope to achieve a ‘hit’ for every output unit and every training pattern (Dawson, 2004). A ‘hit’ occurs when an output unit generates activity of 0.9 or higher when desired activity is 1, or when an output unit generates activity of 0.1 or lower when desired activity is 0. None of our ten perceptrons achieved perfect performance. However, we can use the definition of ‘hit’ to measure network accuracy after 5,000 epochs of training.

[35] With six output units and 1,603 stimuli, the maximum possible number of hits is 9,618. The accuracy of the ten perceptrons can be measured by determining how close networks come to achieving maximum performance. When we examined network performance on all stimuli after training, we discovered on average a perceptron generated 9368.10 hits (SD = 7.65), or 97.4% accuracy. The least accurate perceptron generated 9,354 hits (97.26% accuracy), while the most accurate perceptron generated 9,381 hits (97.54% accuracy). Thus, while none of the networks learned to generate six correct outputs for every song, all perceptrons came very close to achieving perfect performance, and all perceptrons generated very similar performances.

[36] When accuracy is defined using hits as described above, network performance is assessed by considering every output unit individually. A more conservative measure of accuracy determines how many of the 1,603 songs cause a network to produce six hits (i.e., a correct response in all six output units). This measure evaluates network performance in terms of songs, not in terms of individual outputs, and is more conservative because a song is evaluated as being incorrectly classified even when five of the six output units produce hits.

[37] When we examined network performance with our more conservative measure, we discovered on average a perceptron generated correct responses to 1,412.90 songs (SD = 6.77), amounting to 88.14% accuracy. The least accurate perceptron generated correct responses to 1,400 songs (87.34% accuracy), while the most accurate perceptron generated correct responses to 1,422 songs (88.71% accuracy). Thus, even with a more conservative measure of accuracy, all perceptrons were extremely good song classifiers. Note that both our liberal measure of accuracy (97.4%) and our more conservative measure of accuracy (88.14%) both exceed the accuracy of 80.7% reported by Guo et al. (2018) for their DNN.

 

Using Perceptrons to Modify Summodal Profiles

 

[38] The results described above show perceptrons can learn to use Cantometric features to accurately assign songs to cultural regions. We now turn to a second issue: using our trained perceptrons to inform Cantometric theory.

[39] When the output units of a perceptron use the logistic activation function, perceptron properties can be described by probability theory (Dawson, 2022). Dawson (2022, Chapter 4) demonstrated this by formally converting mathematical properties of perceptrons (activity, connection weights) into mathematical properties of Bayesian probability theory (conditional probability, odds ratios). First, an output unit’s activity is literally a conditional probability. For example, in the Figure 1 perceptron, an output unit’s activity is the conditional probability a song originates from the output unit’s cultural region given the song’s Cantometric features. Second, a perceptron’s connection weights can also be described by probability theory. A connection weight is the natural logarithm of the odds ratio relating an input feature to an output response, meaning the connection weight shows how an output unit’s conditional probability changes when a specific input feature is detected.

[40] We now describe using the perceptron’s probabilistic properties to inform Cantometrics. First, we introduce another idea from Cantometrics, the summodal profile. Second, we describe a problem with summodal profiles. Third, we discuss how we can address the problem by using perceptron properties to modify summodal profiles.

[41] Summodal Profiles. Cantometrics strove to relate song style to cultural properties. Lomax used a summodal profile to summarize the properties of related songs (e.g., songs associated with the same region) and then related his summarized song styles to cultures (Lomax 1968, Appendix 3). A summodal profile summarizes how Cantometric features are distributed over a number of different songs. In a summodal profile (see Table 4), the states of each Cantometric feature are represented as proportions; the sum of a row’s proportions is 1. Table 4 provides an example summodal profile we computed for our stimulus songs from the South America region. Each number in the table represents the proportion of summarized songs possessing a particular state for a Cantometric feature. For instance, in Table 4, for the feature ‘Social Organization: Vocal Group,’ a proportion of 0.01 of the 299 summarized songs are scored as possessing S1 (‘No Singers’), while a proportion of 0.36 of the songs are scored as possessing S2 (‘One Singer’).

[42] Lomax designed summodal profiles to be quickly scanned to get a sense of the degree to which the summarized songs possessed individualistic versus integrated characteristics. More modern examples of summodal profiles differ from Table 4 by replacing numbers with dots whose size reflects number size, making the summodal profile even more visual (Passmore and Savage 2023). A collection of individualistic songs will have higher proportions in the left columns of a summodal profile, while a collection of integrated songs will have higher proportions in the right columns. Summodal profiles for songs belonging to different cultural regions should permit songs from different regions to be distinguished from one another.

Table 4. A summodal profile of Cantometric features. Each row corresponds to a particular feature. Each column represents a possible state for that feature. The number of possible states vary from feature to feature, ranging from three to thirteen. This table represents the 299 songs originating from the South America region. Each row in the table adds up to 1. Each value in a row indicates the proportion of songs in the sample for which a feature adopts a particular state. For example, for Feature 1, the proportion of songs that have a value of S1 for this feature is 0.01, the proportion having a value of S2 is 0.36, and so on.

 

[43] A problem with summodal profiles. However, one problem with using summodal profiles to compare song collections is that a summodal profile provides the prevalence of a feature state in a collection but does not indicate the importance of the feature state for distinguishing one collection from others. We use Table 5 to illustrate the problem for two feature states. The table presents the proportion of songs from each of our six regions scored as having a particular state for a particular Cantometric feature. Note that each proportion is high, indicating the state is an important one for each region. However, the variation between proportions from the different regions is low, indicating the high value of the state does not differentiate one region’s songs from another’s.

Table 5. The proportion of songs, for each of the six regions in our training set, which are scored as having a particular state for two Cantometric features. The mean and the standard deviation for each row are also provided.

 

[44] The problem illustrated in Table 5 is not easily resolved by considering an entire summodal profile instead of individual values. We used the input features of our training set to compute the summodal profile for each of our six regions (i.e., we computed summodal profiles for the different groups of songs summarized in the rows of Table 3). We then correlated each summodal profile with the other profiles. We present the correlations in Table 6. For all regions except Region 6 (Afro-Atlantic peoples) correlations between summodal profiles range between 0.70 and 0.83. In other words, one summodal profile tends to be highly similar to others, making it difficult to use summodal profiles to characterize differences between songs from different regions.

Table 6. Correlations between summodal profiles computed for each set of songs belonging to the six different regions in the training set.

 

[45] Using perceptrons to improve profiles. Fortunately, perceptron structure can solve the problem with summodal profiles. Connection weights reflect the probabilistic relationship between stimulus features and network responses (Dawson, 2022). Therefore, we can use connection weights to modify the proportions in a summodal profile. In our perceptrons, each cultural region is represented by a single output unit. A region’s output unit is connected to each of the 222 feature states, feature states which are also represented in a summodal profile. We can therefore multiply each proportion in a summodal profile by a corresponding connection weight, a weight which reflects the probability that the feature state signals region membership. The modified feature value includes two different kinds of information: the likelihood of finding a feature state in a song collection (the proportion), and the importance of the state for identifying a region (the weight). As a result, large positive values in a weighted table represent feature states important to classifying a song as belonging to a region, while large negative values represent feature states important to classifying a song as not belonging to a region. Features states for less common, or less important, features will have very small values in the weighted summodal profile.

[46] To explore the utility of using perceptron weights to scale summodal profiles, we calculated the average connection weight between each feature state and each output unit by taking the mean weight across the ten trained perceptrons. We then scaled each summodal profile by multiplying each feature state’s proportion by the corresponding average connection weight.

[47] Table 7 provides the proportions provided earlier in Table 5, but after being scaled by connection weights. If one compares the standard deviations of the two tables, then one will see much higher values when proportions are scaled. Furthermore, the Table 7 values more clearly indicate the relationship between feature state and the cultural region. For instance, if a song has State 9 for melodic shape, this is very strong evidence for it belonging to R2 (because of the high positive value in Table 7), while it is very strong evidence for the song not belonging to R5 (because of the high negative value in Table 7). Similarly, Table 7 reveals if a song has State 13 for ‘Rubato: Orchestral,’ this is strong evidence against it belonging to R1, and moderate evidence for it belonging to R4 or R6.

Table 7. The weighted proportion of songs, for each of the six regions in our training set, which are scored as having a particular state for two Cantometric features. The values in this table are calculated by scaling each Table 4 value by an appropriate mean connection weight. The mean and the standard deviation for each row are also provided.

 

[48] Not surprisingly, differences between whole summodal profiles are accentuated after being weighted. Table 8 provides the correlations amongst the weighted summodal profiles. Note that the Table 8 correlations are substantially smaller than the correlations presented in Table 6. The most extreme correlation in Table 7 has a value of -0.28, which is roughly a third the absolute value of the most extreme correlation in Table 5 (0.83). In other words, when summodal profiles are scaled by connection weights, differences between summodal profiles are substantially increased because similarities between profiles are decreased. In short, scaled summodal profiles provide clearer evidence for distinguishing songs of one culture from songs of another.

 

Table 8. Correlations amongst the weighted summodal profiles computed for each set of songs belonging to the six different regions in the training set.

 

Discussion

 

[49] The current paper explored two related issues. First, if we encode songs as sets of Cantometric features then can these features be used by a simple network – a perceptron – to assign songs to cultural regions? Second, can the structure of such a perceptron be used to inform Cantometric theory?

[50] With respect to the first question, when we represent songs using Cantometric features, we find perceptrons can map songs to cultural regions with high accuracy. Our perceptrons illustrate how representing stimuli with appropriate features permits simple ANNs to accomplish complex tasks. Of course, such an illustration is, in principle, unsurprising. It is well known that a computer simulation’s performance depends on how information is encoded. However, in practice, one must discover a particular encoding which works well for solving a particular problem. We have found no reports in the literature about how well Cantometric features can be used to classify songs to regions. Our perceptrons show Cantometric features are indeed well-suited for this task.

[51] To elaborate, the performance of our perceptrons offers new support for Lomax’s Cantometrics scoring scheme. Cantometrics occupies a controversial position in the literature (Savage 2018; Seeger 2006). One source of controversy appears in methodological criticisms raised in a number of reviews of Lomax’s 1968 book Folk song style and culture (Downey 1970; Driver 1970; McLean 1973; Nettl 1970; Pantaleoni 1972). The criticisms concerned the nature of the Cantometric scoring method, its reliability, its potential bias, and the small number of songs per culture upon which it was used. As a result, the criticisms disputed Cantometrics’ value. “The basic data of Cantometrics are questionable in some cases and demonstrably wrong in others” (Pantaleoni, 1972, p. 158). Responses to such criticisms have appeared recently (Savage, 2018; Wood, 2018b). Our results, which show Cantometric features can accurately relate songs to cultural regions, provide additional support for Lomax’s methodology. As noted earlier, we have not found any previous papers which show computational methods can predict cultural region from Cantometric features.

[52] Our second question involved determining whether simpler networks could be used to inform theory. We chose perceptrons to explore the link between Cantometric features and cultural regions because a perceptron’s connection weights are strongly related to probability theory (Dawson 2022a; Dawson et al. 2009; Dawson and Gupta 2017; Dawson and Dupuis 2012). A connection weight assigned to a Cantometric state indicates the degree to which the state supports (or negates) assigning a song to a particular cultural region.

[53] We demonstrated perceptron weights can inform Cantometric theory by modifying summodal profiles. Summodal profiles were used by Lomax to summarize properties of related songs, such as songs associated with the same cultural region. However, Lomax’s summodal profiles of songs from different regions can be highly correlated. We used perceptron weights to modify Cantometric scores when we created new summodal profiles. Our weighted profiles were much less correlated with one another, showing ANN properties can make an important Cantometric representation more discriminating.

[54] The performance of our perceptrons raises questions to be explored in future research. First, none of our ANNs achieved perfect performance, suggesting hidden units are required to correctly classify all songs. We can explore how multilayer perceptrons (MLPs) – networks with only one layer of hidden units– learn the same classification problem described in the current paper. We have trained MLPs on a variety of music-related tasks and interpreted their internal structure in high detail (Dawson 2018; Dawson, Perez, and Sylvestre 2020; Perez et al. 2023). When hidden units are incorporated into an ANN, they detect higher-order combinations of input features. This means interpreting the structure of MLPs trained on the task described above could reveal new higher-order relationships between Cantometric features, relationships important to using song properties to identify cultural regions. Discovering such relationships within an MLP would demonstrate another way in which ANNs could contribute to Cantometric theory.

[55] Second, the fact our networks do not perform perfectly suggests a different perspective on network interpretation: examining the kinds of errors the perceptrons make. Examining regularities in errors is common practice in fine tuning theories in cognitive science because in many cases errors are informative because they are systematic (Pylyshyn 1984; Dawson 2013, 2022b). One avenue of future research is to examine network errors in order to see if they too can inform Cantometric theory.

[56] Third, the high performance of our perceptrons suggests the feasibility of developing an applied prediction tool which uses Cantometric features to identify cultural regions. Developing such a tool requires training networks on different subsets of songs and then testing their performance on validation sets built from songs not used in training. Such a project requires extensive work because many different training and validation sets must be used to properly assess network performance, and because a variety of different networks – ranging from perceptrons through MLPs to DNNs – need to be explored. Fortunately, since this research project was undertaken, a full dataset of 5,776 songs with Cantometric scores have been cleaned, curated, and is publicly available from https://github.com/theglobaljukebox (Wood et al. 2022). The current results indicate such a project is worth the extensive effort.

[57] Indeed, a pilot simulation indicates the perceptrons described in the current paper may indeed generalize well. We randomly split the training set used in the simulations described above into a training set of 1,208 patterns (about 75% of the full training set) and a test set of 395 patterns (about 25% of the full training set). We trained a perceptron on the 1,208 patterns using the procedure described earlier. At the end of training the perceptron generated 7,161 hits and 87 misses, with an accuracy of about 98%. We then presented the 395 test patterns to the perceptron without additional training. The perceptron generated 2,079 hits and 291 misses to the new patterns, with an accuracy of about 88%. The pilot result suggests a large, proper study of generalization of our perceptrons will show excellent results.

[58] Fourth, the current research explored whether using different features to represent stimuli would permit simple ANNs to perform a task which has only been studied with much more complex networks. The performance of our perceptrons demonstrates the plausibility of our approach. However, it also raises questions about whether more traditional classification techniques from traditional statistics, like multiple regression, decision trees, or cluster analysis can also map songs to geographical regions using Cantometric features as input. If so, then perhaps ANNs are not required to create an applied prediction tool provided songs are represented by features which make assigning cultural regions an easier task to perform.

Acknowledgements

 

We would like to thank and acknowledge Dr. Anna Lomax Wood, president of the Association for Cultural Equity (https://www.culturalequity.org/), for providing the Cantometrics data free of charge to our research team. We thank the Kule Institute for Advanced Study (KIAS) at the University of Alberta for providing a Team Grant (2016) and Cluster Grant (2017) supporting this research.

 

References

 

Adadi, Amina, and Mohammed Berrada. 2018. “Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI).” IEEE Access 6: 52138-52160. https://doi.org/10.1109/access.2018.2870052. <Go to ISI>://WOS:000447797600001.

Ali, Sajid, Tamer Abuhmed, Shaker El-Sappagh, Khan Muhammad, Jose M. Alonso-Moral, Roberto Confalonieri, Riccardo Guidotti, Javier Del Ser, Natalia Diaz-Rodriguez, and Francisco Herrera. 2023. “Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence.” Information Fusion 99. https://doi.org/10.1016/j.inffus.2023.101805. <Go to ISI>://WOS:001049322700001.

Angelov, Plamen P., Eduardo A. Soares, Richard Jiang, Nicholas I. Arnold, and Peter M. Atkinson. 2021. “Explainable artificial intelligence: an analytical review.” Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery 11 (5). https://doi.org/10.1002/widm.1424. <Go to ISI>://WOS:000671832600001.

Arrieta, A. B., N. Diaz-Rodriguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, and F. Herrera. 2020. “Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI.” Information Fusion 58: 82–115. https://doi.org/10.1016/j.inffus.2019.12.012. <Go to ISI>://WOS:000516799200007.

Bishop, C.M. 2006. Pattern Recognition and Machine Learning.Information science and statistics. New York: Springer.

Briot, Jean-Pierre. 2021. “From artificial neural networks to deep learning for music generation: history, concepts and trends.” Neural Computing & Applications 33 (1): 39–65. https://doi.org/10.1007/s00521-020-05399-0. <Go to ISI>://WOS:000580490600001.

Confalonieri, Roberto, Ludovik Coba, Benedikt Wagner, and Tarek R. Besold. 2021. “A historical perspective of explainable Artificial Intelligence.” Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery 11 (1). https://doi.org/10.1002/widm.1391. <Go to ISI>://WOS:000581251500001.

Dawson, M.R.W. 2004. Minds And Machines: Connectionism And Psychological Modeling. Malden, MA: Blackwell Pub.

—. 2005. Connectionism : A Hands-on Approach. 1st ed. Oxford, UK ; Malden, MA: Blackwell Pub.

—. 2008. “Connectionism and classical conditioning.” Comparative Cognition and Behavior Reviews 3 (Monograph): 1-115.

—. 2009. “Computation, cognition – and connectionism.” In Cognition, Computation, and Pylyshyn, edited by D. Dedrick and L. Trick, 175–199. Cambridge, MA: MIT Press.

—. 2013. Mind, Body, World: Foundations Of Cognitive Science. Edmonton, AB: Athabasca University Press.

—. 2018. Connectionist Representations of Tonal Music: Discovering Musical Patterns By Interpreting Artificial Neural Networks. Edmonton, AB: Athabasca University Press.

—. 2022a. “Probability learning by perceptrons and people.” Comparative Cognition and Behavior Reviews 15 (Monograph): 1–188.

—. 2022b. What is cognitive psychology? Edmonton, AB: Athabasca University Press.

Dawson, M.R.W., and B. Dupuis. 2012. “Equilibria of perceptrons for simple contingency problems.” IEEE Transactions On Neural Networks And Learning Systems in press.

Dawson, M.R.W., B. Dupuis, M. L. Spetch, and D.M. Kelly. 2009. “Simple artificial networks that match probability and exploit and explore when confronting a multiarmed bandit.” IEEE Transactions on Neural Networks 20 (8): 1368–1371.

Dawson, M.R.W., and M. Gupta. 2017. “Probability matching in perceptrons: Effects of conditional dependence and linear nonseparability.” Plos One 12 (2): e0172431. https://doi.org/doi:10.1371/journal.pone.0172431 http://dx.doi.org/10.1371/journal.pone.0172431

Dawson, M.R.W., A. Perez, and S. Sylvestre. 2020. “Artificial neural networks solve musical problems with Fourier phase spaces.” Scientific Reports 10 (1): 7151. https://doi.org/10.1038/s41598-020-64229-4. https://doi.org/10.1038/s41598-020-64229-4.

Dawson, M.R.W., and J.A.Z Zielinski. 2018. “Key-finding by artificial neural networks that learn about key profiles.” Canadian Journal of Experimental Psychology-Revue Canadienne De Psychologie Experimentale 72 (3): 153–170. https://doi.org/10.1037/cep0000135. <Go to ISI>://WOS:000443316800002.

de Bruijn, Hans, Martijn Warnier, and Marijn Janssen. 2022. “The perils and pitfalls of explainable AI: Strategies for explaining algorithmic decision-making.” Government Information Quarterly 39 (2). https://doi.org/10.1016/j.giq.2021.101666. <Go to ISI>://WOS:000788795700016.

Deeks, A. 2019. “The judicial demand for explainable artificial intelligence.” Columbia Law Review 119 (7): 1829–1850. <Go to ISI>://WOS:000505471500006.

Downey, J. C. 1970. “Folk song style and culture: A staff report on cantometrics by Alan Lomax.” Ethnomusicology 14 (1): 63-67. <Go to ISI>://WOS:A1970Y273800003.

Driver, H. E. 1970. “Folk song style and culture: A staff report on cantometrics by Alan Lomax.” Ethnomusicology 14 (1): 57–62. https://doi.org/10.2307/850293. <Go to ISI>://WOS:A1970Y273800002.

Freeman, L. C., and A. P. Merriam. 1956. “Statistical classification in anthropology: An application to ethnomusicology.” American Anthropologist 58 (3): 464–472. https://doi.org/10.1525/aa.1956.58.3.02a00060. <Go to ISI>://WOS:A1956CBM2500005.

Ghassemi, Marzyeh, Luke Oakden-Rayner, and Andrew L. Beam. 2021. “The false hope of current approaches to explainable artificial in health care.” Lancet Digital Health 3 (11): E745-E750. <Go to ISI>://WOS:000723682800012.

Gorsuch, R.L. 1983. Factor Analysis. Second Edition ed. Hillsdale, NJ: Lawrence Erlbaum Associates.

Guo, Y., M Frishkopf, S.P. Hernandez, and V.  Bulitko. 2018. “Music classification based on cantometrics.” The Society for Ethnomusicology 63rd Annual Meeting, Albuquerque, New Mexico, November 2018.

Hinton, G. E., S. Osindero, and Y. Teh. 2006. “A fast learning algorithm for deep belief nets.” Neural Computation 18 (7): 1527-1554. https://doi.org/10.1162/neco.2006.18.7.1527. <Go to ISI>://WOS:000237698100002.

Humphrey, E.J., J.P. Bello, and Y. LeCun. 2013. “Feature learning and deep architectures: new directions for music informatics.” Journal of Intelligent Information Systems 41 (3): 461–481. https://doi.org/10.1007/s10844-013-0248-5. <Go to ISI>://WOS:000326932800006.

LeCun, Y., Y. Bengio, and G. Hinton. 2015. “Deep learning.” Nature 521 (7553): 436–444. https://doi.org/10.1038/nature14539. <Go to ISI>://WOS:000355286600030.

Lena, J. C., and R. A. Peterson. 2008. “Classification as culture: Types and trajectories of music genres.” American Sociological Review 73 (5): 697–718. https://doi.org/10.1177/000312240807300501. <Go to ISI>://WOS:000259978300001.

Lomax, A. 1959. “Folk song style.” American Anthropologist 61 (6): 927-954. https://doi.org/10.1525/aa.1959.61.6.02a00030. <Go to ISI>://WOS:A1959CBX3500001.

—. 1962. “Song structure and social structure.” Ethnology 1 (4): 425–451. https://doi.org/10.2307/3772850. <Go to ISI>://WOS:A1962CGR9400003.

—. 1967. “Song styles: An indicator of popular culture.” Public Opinion Quarterly 31 (3): 469–470. <Go to ISI>://WOS:A1967ZA24800047.

—. 1968. Folk song style and culture. Vol. no 88American Association for the Advancement of Science Publication. Washington,: American Association for the Advancement of Science.

—. 1977. “Universals in song.” World of Music 19 (1-2): 117–130. <Go to ISI>://WOS:A1977FG28700008.

Lomax, A., R. Rudd, V. Grauer, N. Berkowitz, B. L. Hawes, and C.  Kulig. 1976. Cantometrics : an approach to the anthropology of music : audiocassettes and a handbook. Berkeley, CA: Extension Media Center, University of California.

McLean, M. 1973. “Folk song style and culture by Alan Lomax ” Journal of the Polynesian Society 82 (4): 415–422. <Go to ISI>://WOS:A1973S748400007.

Minh, Dang, H. Xiang Wang, Y. Fen Li, and Tan N. Nguyen. 2022. “Explainable artificial intelligence: a comprehensive review.” Artificial Intelligence Review 55 (5): 3503–3568. https://doi.org/10.1007/s10462-021-10088-y. <Go to ISI>://WOS:000720207700001.

Minsky, M.L., and S. Papert. 1969. Perceptrons: An Introduction To Computational Geometry. 1st ed. Cambridge, Mass.,: MIT Press.

Montavon, G., W. Samek, and K. R. Muller. 2018. “Methods for interpreting and understanding deep neural networks.” Digital Signal Processing 73: 1–15. https://doi.org/10.1016/j.dsp.2017.10.011. <Go to ISI>://WOS:000422703400001.

Mousavi, N., H. Siqueira, P. Barros, B. Fernandes, S. Wermter, and Ieee. 2016. “Understanding how deep neural networks learn face expressions.” In 2016 International Joint Conference on Neural Networks, In IEEE International Joint Conference on Neural Networks (IJCNN), 227–234.

Moysis, Lazaros, Lazaros Alexios Iliadis, Sotirios P. Sotiroudis, Achilles D. Boursianis, Maria S. Papadopoulou, Konstantinos-Iraklis D. Kokkinidis, Christos Volos, Panagiotis Sarigiannidis, Spiridon Nikolaidis, and Sotirios K. Goudos. 2023. “Music deep learning: Deep learning methods for music signal processing-a review of the state-of-the-art.” IEEE Access 11: 17031-17052. https://doi.org/10.1109/access.2023.3244620. <Go to ISI>://WOS:000944945600001.

Murdock, G. P. 1967a. Ethnographic atlas. Pittsburgh: University of Pittsburgh Press.

—. 1967b. “Ethnographic atlas: A summary.” Ethnology 6 (2): 109–236. https://doi.org/10.2307/3772751. <Go to ISI>://WOS:A19679295100001.

Nettl, B. 1970. “Folk song style and culture by Alan Lomax.” American Anthropologist 72 (2): 438-441. https://doi.org/10.1525/aa.1970.72.2.02a00600. <Go to ISI>://WOS:A1970G208500046.

Pantaleoni, H. 1972. “Folk song style and culture by Alan Lomax.” Yearbook of the International Folk Music Council 4: 158–161.

Passmore, S., and P. E. Savage. 2023. “The exceptions and the rules in global musical diversity.” Journal of Cognition 6 (1). https://doi.org/10.5334/joc.312. <Go to ISI>://WOS:001376965900016.

Perez, Arturo, Helen L. Ma, Stephanie Zawaduk, and Michael R. W. Dawson. 2023. “How Do Artificial Neural Networks Classify Musical Triads? A Case Study in Eluding Bonini’s Paradox.” Cognitive Science 47 (1): e13233. https://doi.org/https://doi.org/10.1111/cogs.13233. https://onlinelibrary.wiley.com/doi/abs/10.1111/cogs.13233.

Pylyshyn, Z.W. 1984. Computation And Cognition. Cambridge, MA.: MIT Press.

Rosenblatt, F. 1958. “The perceptron: A probabilistic model for information storage and organization in the brain.” Psychological Review 65 (6): 386–408. <Go to ISI>://A1958WG40900006.

—. 1962. Principles Of Neurodynamics. Washington: Spartan Books.

Savage, P.E. 2018. “Alan Lomax’s cantometrics project: A comprehensive review.” Music & Science 1: 1–19.

Seeger, A. 2006. “Lost lineages and neglected peers: Ethnomusicologists outside academia.” Ethnomusicology 50 (2): 214–235. <Go to ISI>://WOS:000238749500006.

Tjoa, Erico, and Cuntai Guan. 2021. “A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI.” IEEE Transactions on Neural Networks and Learning Systems 32 (11): 4793–4813. https://doi.org/10.1109/tnnls.2020.3027314. <Go to ISI>://WOS:000711638200007.

Wood, A.L.C. 2018a. “”Like a Cry from the Heart”: An insider’s view of the genesis of Alan Lomax’s ideas and the legacy of his research: Part I.” Ethnomusicology 62 (2): 230–264. https://doi.org/10.5406/ethnomusicology.62.2.0230. <Go to ISI>://WOS:000432586300004.

—. 2018b. “”Like a Cry from the Heart”: An insider’s view of the genesis of Alan Lomax’s ideas and the legacy of his research: Part II.” Ethnomusicology 62 (3): 403–438. https://doi.org/10.5406/ethnomusicology.62.3.0403. <Go to ISI>://WOS:000445097500005.

Wood, A.L.C., Kathryn R. Kirby, Carol R. Ember, Stella Silbert, Sam Passmore, Hideo Daikoku, John McBride, Forrestine Paulay, Michael J. Flory, John Szinger, Gideon D’Arcangelo, Karen Kohn Bradley, Marco Guarino, Maisa Atayeva, Jesse Rifkin, Violet Baron, Miriam El Hajli, Martin Szinger, and Patrick E. Savage. 2022. “The Global Jukebox: A public database of performing arts and culture.” Plos One 17 (11). https://doi.org/10.1371/journal.pone.0275469. <Go to ISI>://WOS:000925209600012.

Zhang, Zhongheng, Marcus W. Beck, David A. Winkler, Bin Huang, Wilbert Sibanda, Hemant Goyal, and A. M. E. Big-Data Clinical Tria Written. 2018. “Opening the black box of neural networks: methods for interpreting neural network models in clinical applications.” Annals of Translational Medicine 6 (11). https://doi.org/10.21037/atm.2018.05.32. <Go to ISI>://WOS:000435615200018.