A Technical Explanation of Theme and Variations: A Computer Music Work Utilizing Network Compositional Algorithms




Tom North





One approach to twentieth century music has been the application of computer algorithms to composition. During the spring of 1984, I designed a compositional algorithm that generated musical output based on a supplied input. The opportunity to utilize this algorithm, which I refer to as network, inspired the creation of my first computer music composition, Theme and Variations. Along with utilizing the computer to assist in composition, it also functioned as the means of sound synthesis for the entire piece. This document will explain network construction, strategy and usage, plus the associated digital instrument design. Theme and Variations was realized at the Computer Audio Research Laboratory (CARL), located in the Center for Music Experiment at the University of California, San Diego.



At the time this piece was created, the environment for computer music creation on the CARL system was highly flexible and virtually limitless. The CARL system could theoretically synthesize any sound representable by a voltage driven speaker. The piece was realized on the CARL DEC VAX 11/780 computer, utilizing the Berkeley UNIX operating system. All software for this piece was written in the "C" programming language.


Two versions of the piece were created, each using a different sampling rate[1]. Sampling rate affects the bandwidth of the output signal, with the highest frequency of the band theoretically equal to one half the sampling rate. The first version of Theme and Variations was computed at a sampling rate of 16KHz (K = 1024, Hz = Hertz), resulting in a theoretical bandwidth of 0-8KHz. The effective audio bandwidth was approximately 0-6.4kHz (k = 1000). The second version was essentially the same computer score as the first but sampled at 48KHz, resulting in a much wider effective audio bandwidth of about 0-20kHz. This greatly improved the sound quality of the piece.




Theme and Variations uses the computer in two fundamental capacities. It is used both as a compositional tool and as the means of sound synthesis for the entire piece. One of the motivating factors in realizing this piece was the opportunity to apply the compositional algorithm network. The following chapters explain the construction of digital instruments, the strategy and design of the network compositional algorithm, and finally the application and interfacing of the component parts to generate music.

Computer Instrument Design


The sound medium ideally suited to express Theme and Variations was an ensemble of different instruments. To realize this medium, an ensemble of computer instruments was created that were digitally synthesized by the CARL system.




The digital synthesis program used on the CARL system, written by F. Richard Moore, is cmusic[2]. The input to cmusic is usually a text file, often called a score, and is typically subdivided into two sections: the instrument definition(s), and the note list. The note list sequences the events as described by the instrument(s). This chapter will focus on digital instrument design using pictorial representations.


Whenever possible each instrument design was to have all parameters except pitch and amplitude enclosed within the instrument definition. This meant the note list would contain five basic parameters: instrument name, start time, duration, amplitude, and pitch. Nine different instruments were designed, giving a wide range of timbres. To create the different instruments, three different synthesis techniques were utilized: simplified physical modeling, waveshaping, and frequency modulation. These techniques were used both separately and in combination to obtain the different timbres.

Karplus-Strong Plucked String Algorithm


Prior to the realization of Theme and Variations, a new cmusic unit generator (UG), entitled fltdelay, was implemented using the Karplus-Strong plucked string algorithm[3].The basic components of the algorithm are a recirculating delay line and a low-pass filter expressed by the formula:


where y(n) is the value of the output signal at sample n, x(n) is the value of the input signal at sample n, and N is the approximate pitch period of the note in samples. In all of the uses of this algorithm, the delay line was pre-loaded from a wavetable containing random noise, resulting in a noise burst for the initial attack. The entire system models the physical nature of a plucked string, with the noise burst representing the pluck, the recirculating delay line acting as the resonator, and the low-pass filter simulating the damping or decaying of the string.


For increased flexibility, several extensions and modifications of the basic model were suggested by Jaffe and Smith[4]. Mark Dolson, a visiting researcher at the time, implemented the fltdelay UG using the above algorithm and extensions, as well as his own additions. Even in its simplest usage, fltdelay offers a spectrally rich and dynamic sound. This was greatly magnified at the higher sampling rate of 48K. It was therefore used as a fundamental timbre in the ensemble. Three instruments are based on this UG, with several other instruments using it to augment their sound.


The fltdealy UG has many independent control variables offering a wide range of timbral variation. Following is a list of the important arguments with a brief description:

decay: duration factor (0 to 1: 1 = full duration).

level: amplitude of pluck (0 to 10: 10 = loudest).

final: number of dB down at note's end (0 to 100: 40 = normal).

onset: attack time for pluck (0 to .1 sec: 0 = fast).

place: pluck point on string (0 to .5: 0 = normal).

noise: time of initial noise burst (-1 to +0.1 sec: 0 = normal).



                Figure 1: Guitar Instrument


Guitar Instrument


The first instrument designed for the ensemble utilized fltdelay in an extremely simple, yet very effective capacity, to generate a guitar-like timbre. Figure 1 gives a pictorial instrument description with a table of argument values for fltdelay.


One strategy in overall design was to normalize all amplitudes internally, within each instrument. This is to say a 0dB amplitude in one instrument would sound equivalent to a 0dB amplitude of a second instrument (0dB is defined as the maximum amplitude in cmusic). To accomplish this it was necessary to adjust amplitude arguments within primary unit generators so their output would be close to 0dB, and then add a mult unit generator, functioning like a master volume control, just before the output. In the fltdelay UG, a value of 2.2 for the level argument gave amplitudes of around 0dB over a wide range of pitches. With fltdelay normalized, the actual amplitude was adjusted in the mult unit generator below it, with this value declared in the note statement. The onset argument was set a .001 to remove a small amount of hardness from the attack. The remaining arguments used default values and gave a very rewarding guitar-like timbre.


Guitar-glissando Instrument


This instrument is similar to the guitar design with a transition UG added to the pitch parameter, to allow for glissandi. Network considered the guitar glissandoing up and down as two timbres that were separate from the non-glissando guitar. Dependent upon what was requested by network, this instrument would glissando either up or down, and always by the amount of a minor third. Both the start pitch and the ending pitch, for the glissando, were set in the note statement. Figure 2 shows the guitar-glissando instrument. The final argument was changed to a value of 10 (its value was 40 in the guitar), which lessened the decay, allowing the ending pitch to be more apparent.

Pizzicato Violin Instrument


This instrument, shown in Figure 3, was modeled after the sound of a pizzicato violin, and again uses fltdelay as its fundamental means of synthesis. To create this timbre required the modification of the guitar design in two ways: an envelope UG was added, and adjustments to several fltdelay arguments were necessary. The additional unit generator serves as an overall amplitude envelope that does not affect the attack, but decays sharply after 50 milliseconds (ms). Therefore, no matter what duration is specified in the note statement, the result is always a very short, staccato sound. (The dotted line in this and all functions represents a variable time value, computed automatically, and is equivalent to the note's duration minus the fixed time values.) The amplitude envelope also contains the master volume argument that is set in the note statement.


                       Figure 2: Guitar-glissando instrument

                                 Figure 3: Pizzicato violin instrument


Three fltdelay arguments were changed from the guitar design to attain the pizzicato violin timbre. The final value was changed to 80, contributing to a short, staccato sound. To create a sharper attack, the onset value was changed to 0. For better amplitude normalization, the level argument value was changed to 2.15.


Clarinet Instrument Design Using Waveshaping


Since many instruments in the ensemble have the plucked string timbre due to using fltdelay, this instrument was designed for contrast and modeled after the sound of a clarinet. The synthesis technique used to achieve this timbre is non-linear distortion or waveshaping[5]. The basic process involves feeding a cosine wave through a non-linear processing device, thus destorting the input. The processing device contains a transfer or shaping function that determines the output waveform. One useful transfer function creating a harmonic waveform output with easy control over number of harmonics and their respective amplitudes, is generated by using Chebychev polynomials.


The principle tools within cmusic that allow this type of synthesis are the lookup UG, representing the processing device, and the function generator, chubby, that computes Chebychev polynomials for the transfer function used within lookup. This instrument is based on a spectral plot of a clarinet playing Eb 4 that is found in an article by James Moorer[6]. (In describing pitch register, the U.S.A. Standards Association octave notation,[7] will be used, with middle C represented by C4.)


In explaining the design of this instrument a modular approach will be used, concentrating first on component parts then the actual assemblage. The basic model of a waveshaped clarinet is found in Figure 4. Originally this was the entire instrument, but became a component in a more complicated design. The time and amplitude values from the fundamental harmonic of the spectral plot supplied the curve for the overall amplitude envelope (represented by "f2" in Figure 4). The peak values of each harmonic were used for the transfer function, computed via chubby, with the original values represented by "f4" in Figure 7. The add UG was necessary to correct the non-zero DC offset that resulted from using transfer function "f4" (also to correct transfer function "f5", which will be explained shortly).


The primary problem in using just one waveshaping module, is that it was based on the spectral plot of one pitch (Eb) and one duration, 330ms. Synthesis of that pitch and that duration sounded fairly authentic, as did notes and durations close to the original. Notes of more than a minor third away and 1/2 second or more began to sound static and artificial. To alleviate this problem the instrument was modified to change spectra through its duration and contain a random vibrato. The spectral change was accomplished by adding another waveshaping module (with a slightly different transfer function) and a crossfade function that segues one spectra into the other, via amplitude envelopes.


                            Figure 4: Waveshaped clarinet module




                                   Figure 5: Random vibrato module


The random vibrato module outputs the fundamental pitch with a random vibrato, and is depicted in Figure 5. The maximum amount of pitch deviation in the vibrato is equal to .02 times the fundamental pitch in Hertz. The random UG generates a new value at a 7Hz rate. The vibrato is switched off for the first 48ms of the note (via an amplitude envelope), then gradually engaged for the remainder of the duration.


The complete clarinet instrument is shown in Figure 6 with the harmonic weightings of the transfer functions in Figure 7. Modular components are represented by dotted lines. The crossfade envelope and the inverse UG segue waveshaping module one into waveshaping module two. The crossfade begins after 48ms and continues for the remainder of the note. Therefore, for the first 48ms, waveshaping module one is full amplitude with module two at an amplitude of zero. The random vibrato module controls the pitch of waveshaping module two. Along with the vibrato in waveshaping module two, there is beating between module two and the static (non-vibrato) module one, as the crossfade occurs. Transfer function "f4" is the original from the spectral plot, with "f5" slightly different, creating two unique spectra.

                     Figure 6: Complete clarinet instrument

                         Figure 7: Waveshape transfer functions ("f4" and "f5")


Clarinet-glissaando Instrument

This instrument contains a pitch transition function as well as utilizing the timbral crossfade concept to obtain the desired timbre. The two waveforms used in the timbral crossfade are the clarinet waveform (via one waveshaping module), as the attack portion, and the plucked string timbre (via fltdelay), for the steady state and decay portion of the note. The random vibrato was added to the second waveform (the plucked string portion). The glissando was attained with a transition UG outputting to both waveforms. Like the guitar-glissando instrument, the transition unit generator required both start pitch and end pitch arguments, which were declared in the note statement. The waveshaping module using transfer function "f4" and random vibrato module are again used with the entire instrument represented in Figure 8. The crossfade envelope function is identical to that used in the clarinet instrument. The fltdelay argument values were adjusted to get a steady-state portion of a live guitar sound, and to fuse with the clarinet portion of the note. All the sharpness of the pluck was also removed. The final argument, which determines rate of attenuation, was adjusted to a very low value. This adjustment lessened the decay and allowed the target pitch in the glissando to be more apparent.

Pedal Instruments


The myriad possibilities of combining timbres to form new timbres led to the design of two instruments, both using fltdelay functioning like a piano sustain pedal, to give the instruments more resonance. One instrument used the clarinet timbre for its attack portion while the second instrument used the guitar timbre for its attack. Both instruments then used fltdelay for the steady state and decay portion of the note. For descriptive purposes, the steady state and decay portion of the note (using fltdelay) is termed the "pedal" portion.


Both instruments have their attack portions attenuated after a short period to allow the pedal portion to emerge. In the guitar-pedal instrument this is accomplished by adding an amplitude envelope that attenuates the attack portion after 100ms. The original amplitude envelope in the clarinet waveshape module (represented by "f2" in Figure 4) contained a variable steady state time value that adjusted automatically to fill in the duration of the note. For the clarinet-pedal instrument, this portion of the envelope was fixed to 151ms, giving the clarinet attack a total duration of 330ms.


The fltdelay argument values representing the pedal portion were adjusted to remove any sharpness of the pluck and to simulate the steady-state and decay portion of an undamped piano string. Both instruments have the same fltdelay argument values with the exception of the level argument, which is .8 in the guitar-pedal instrument and 2.5 in the clarinet-pedal instrument. These arguments were adjusted to maintain proper balance of attack and pedal portions in their respective instruments. A table of pedal argument values is found in Figure 9, with the complete guitar-pedal and clarinet-pedal instruments found in Figure 10 and Figure 11 respectively. In the guitar-pedal instrument, fltdelay2 represents the pedal.




                     Figure 8: Clarinet-glissando instrument



                       Figure 9: Pedal (fltdelay) argument values




                Figure 10: Guitar-pedal instrument



                  Figure 11: Clarinet-pedal instrument


Snare Instrument


To round out the ensemble several percussion instruments were designed. This instrument was modeled after a snare drum sound, with a sharp attack, quick decay, and an inharmonic spectrum. It is found in Figure 12. The synthesis method used in this instrument is frequency modulation[8], also known as FM, which is a very efficient means of generating sounds with rich spectra. The basic technique is to vary a carrier wave with a modulating wave thus changing the instantaneous frequency of the carrier wave. The rate the carrier wave varies is termed the modulating frequency. Two fundamental parameters determine the output spectrum: the ratio of carrier to modulating frequency, which determines the position of components in the spectrum, and the modulation index, for controlling the number of components that will have significant amplitude.



                        Figure 12: Snare instrument


Whole number ratios of carrier to modulator result in harmonic spectra, while fractional ratios result in inharmonic spectra. The fractional ratio of 1/%2 was ideal for this timbre. The modulation index, I, is determined by I = d/m , where d is the frequency deviation and m the modulation frequency. After much experimentation a value of 17 was used for the modulation index.


The amplitude envelope consisted of an extremely sharp attack of 1ms, with no steady state, and a quick decay of 25ms, summing to a total duration of 26ms. This same envelope was also applied to the index, which time varied the spectra. One final adjustment was necessary with this instrument that was related to the fundamental pitch. The above modulation index produced good results for a specific range of pitches (centered around middle C), but pitches out of this range began to take on different timbres. Since this instrument was non-pitched, percussive, and had an inharmonic spectra, changes in fundamental pitch affected its timbre more than its pitch distinction. To remedy this problem the note statement pitch declaration was permanently fixed to sound a middle C, regardless of what pitch network designated.



Wood Instrument


Another percussion instrument was designed, referred to as wood. In the higher pitch register it has a wood-block sound, although in the lower register it was more suggestive of a tom-tom's timbre. Unlike the snare instrument, this one was pitched, therefore contained no pitch restrictions. This instrument contained a very simple design of a sine wave carrier, with an amplitude envelope identical to the snare instrument. It is represented in Figure 13.




                     Figure 13: Wood instrument





Computer Aided Composition


The algorithmic approach to composition that interested me most was using the computer to generate music based on a supplied input. In musical terms this is analogous to theme and variations, hence the title of the composition. After this approach was decided, several questions had to be addressed in designing the algorithm. This chapter will answer those questions then explain the design and mechanics of the network compositional algorithm.




The first decisions to address were:


1. What musical parameters should be included in the network program?

2. What factors determine the decision process?


In the beginning stages of network design it varied only one parameter, which was the melodic contour. That was improved by adding a second parameter of rhythm, but remained inadequate for an interesting musical description. The network program design finally kept track of six musical parameters: pitch information (melody and harmony), rhythmic information (start times and duration), amplitude, and timbre.


The basic model used for the decision making is a Markov process[9] (also referred to as a Markov chain), where a first order Markov process is defined as:


A finite number of possible states of a system; S1,S2...Sn, with a set of transition probabilities, p i (j ), the probability that if the system is in state S i it will next go to state S j.[10]


Markov chains are also described by degrees of order. A first order Markov chain describes the probability of the next state being dependent on the current state. A second order Markov chain describes the probability of the next state being dependent not only on the current state but also on the preceding state. This would have been too limiting for my purposes and created variations that were too closely related to their source material (theme). A zeroeth order Markov chain describes the probability of the next state being independent of the current or preceding states, and is equivalent to a probability distribution of the state space. This was unacceptable for it would have created variations with little or no familiarity to the theme. The first order Markov chain then was to be the working model, for it offered the ideal amount of variation with a necessary amount of coherence.


In the Markov definition the term state can represent anything the user decides, providing he is consistent. This can also affect the outcome greatly and thus requires further refinement. For pitch material there are essentially two state choices. One choice is to use the pitches themselves as the states with the second option using the interval between the pitches as the state representation. Using the pitch as the state representation offered one serious drawback--the variation would contain no two-note pitch sequence that was not present in the input. By using the interval between pitches, as the state representation, possible two-note pitch sequences not present in the input could occur. This was a far more attractive possibility, so for pitch material the interval would be the state representation.


To define rhythmic information two parameters were needed: the actual duration of a note, and when it starts. The duration is an interval of time and therefore used as its state representation. Start times were described as intervals of time between adjacent notes, with simultaneous pitches having interval values of zero. The state representation for the amplitude of each note was to be its actual value, as opposed to the change of amplitude between successive pitches. The timbre parameter is actually a discrete choice for different computer music instruments. Its state representation is a numerical label for each different instrument. Because six parameters were involved, a hybrid first order Markov chain was used. It will be explained completely in a later section.



Data Representation of Musical Notation


In tandem with the items discussed in the previous section was the question of how to represent the complicated nature of music with numbers. Just as the state definition in the Markov process is critical to output determination, so is the data analog of musical notation. The first objective was to have data compatible with cmusic style note statements. This is to say that each line of data contains all the parameters necessary to describe the complete musical event (like a note statement). Another objective was to have each line of data represent events that were chronologically greater than or equal to event descriptions from the preceding line. If music was simply horizontal or one dimensional (such as melody with no harmony), the above representation would be simple to achieve. A problem in representation arose when adding the second dimension or vertical nature of music (specifically harmony and counterpoint). In short, a one dimensional (linear) representation of a two dimensional system (melody-horizontal, harmony-vertical) was needed.


The six parameters describing the complete musical event are labeled:


1. Interval: distance between adjacent pitches. Represented by integer values, with positive and negative values representing ascending and descending intervals respectively. A zero represents no change of pitch. Twelve tone tempered scale was used, so a 1 represents an ascending half step, a -7 represents a descending perfect fifth, etc.


2. Class: primarily used to differentiate harmonic and melodic pitches. The letter a represented a melodic pitch, the letter b a pitch to be sustained much longer than its stated duration, and the letter c represented a harmonic pitch.


3. Duration: length of a note. Floating point numbers used to represent actual time in seconds.


4. Delay: interval of time between adjacent notes. Again floating point numbers used to represent interval time in seconds.


5. Timbre: numerical representation of a unique instrument.


6. Amplitude: loudness of each note. Integer values used to represent decibels below 0dB (the maximum amplitude used in cmusic).



The key to realizing the above linear representation was accepting a value of zero for the delay parameter, which allows for more than one event (note) occurring at the same time. Figure 14 is a section of the Theme with the numerical representation in Figure 15. The upward ties in the musical example represent the conventional use of the tie, while the downward ties represent notes to be sustained longer than their stated duration. For consistency, the six parameters must be ordered like the example in Figure 15. The interval and delay parameters describe the position of the next note, in the pitch and time domain respectively. The class parameter describes what type of pitch it is. The remaining parameters describe the current note's status. In line 1 of Figure 15 the interval parameter states the next note will be a half-step below the current note with the delay parameter signifying the next note will start .5625 seconds (three-sixteenths at quarter note = 80mm.) after the current note. The letter a under the class parameter signifies this is a melodic pitch. The remaining parameters describe the current note (the first Bb in Figure 14), specifically its duration, amplitude, and timbre. The numerical representation for duration was often increased a small amount over its respective musical representation. This was done to create a more legato feel in the music. Compare the first four notes of Figure 14 with the first four lines of Figure 15. The durations equal the delay in the music but are increased slightly in the numerical representation. Lines 5, 6, and 7 represent a vertical event and describe the three-note chord (C#, D, and F) in Figure 14. This is accomplished by having delay values of zero in lines 5 and 6, and the harmonic class of pitch represented by the letter c in lines 5, 6, and 7. The letter b in the class parameter represented pitches to be sustained much longer than their stated duration and analogous to an engaged piano's sostenuto pedal. The use of the sustained class of pitch is separate from the legato technique explained earlier. Since the interval representation for pitch determines the next pitch, a starting pitch must be specified in the output. This representation makes transposition easy though and accomplished by simply choosing a new starting pitch.


                      Figure 14: Section of Theme


                          Figure 15: Numerical Representation of Theme Section



Data Structure Representation: Basic Concept


For easier understanding, I will give a simplified model of the data structure used, then explain the actual model implemented. Essentially two types of linked lists were utilized. This can be compared to a matrix, with the horizontal list containing unique events, and below each event node a list of successor nodes. Each successor node, associated with a particular event node, points to where that event goes to next. Suppose the event is defined as an individual letter, in the word concentration. Figure 16 shows a pictorial representation of the data structure. The horizontal axis (represented by squares) contains unique letters of the word. Each time a new event is input, a check is made for duplicates against the members already present on the event list. If a duplicate is encountered, the original member's location is used, otherwise a new event node is created. Each event node contains a successor list. A successor node (represented by a circle) is a pointer to a succeeding event, and is added to the appropriate successor list for each event that has a follower. The successor node therefore links each event with its successor. Output is accomplished in this model by selecting a principle node to start with, then randomly choosing a successor node below that principle node to determine the next principle node. So in Figure 16 the letter c would always be followed by letter o or e. The letter o would always be followed by the letter n, etc. In the word Mississippi, notice the letter i" is followed by the letter s twice and the letter p once. The successor list incorporates this statistic, which means there is a 2/3 chance the i" will be followed by an s, and 1/3 chance it will be followed by a p. Also a successor node may point to the event node it is below, as in the case of an s being followed by an s in Mississippi . The last event of an input is considered to have no follower, and not wrap around to the beginning.





                     Figure 16: Simplified data structure



Complete Markov Representation


With six independent musical parameters, several Markov state representations were possible. The simplest one would be to include all six parameters in defining one event. Duplicates would occur only if all six parameters were equal. If one or more parameters were different it would be necessary to create a new node, and it was usually the case that at least one parameter had changed. The drawback with this type state representation is the variations remained too similar to their input. It became apparent that to get interesting variations the state representation should contain more duplicates. With duplicate entries, you add to the successor list, instead of the event list, thereby giving more chance of branching into new sequences. The solution was to create an event structure with fewer parameters to check in the duplication test. This led to the design of a principle event node involved in the duplication check, and with each principle node creating a sub-list of remaining parameters to choose from, not involved in the check.


One of the more important musical parameters was the interval, so together with the class parameter it was placed in the principle event node. This meant that the duplication check involved only those two comparisons, and this condition immediately manifested itself in fewer principle nodes (more duplicates). Along with the successor list, each principle node contained a sub-list with each node of that list containing the four remaining parameters acquired from either the original member's creation or from a duplicate entry. In generating variations, each time the principle node was accessed for interval and class values the remaining parameter values were randomly chosen from that principle node's sup-list.


The above concept was extended one step further by having each principle node containing two sub-lists. This resulted in greater possible permutations of the parameters in the output. Again the principle node contained the interval and class information, but one sub-list contained the rhythmic information of duration and delay while the second sub-list consisted of timbre and amplitude information. Each sub-list was accessed independently by a random choice for the necessary parameters. The two sub-lists state representation was to be the working model for the network data structure.


For clarification Figure 17 depicts a sample usage of the preceding three Markov state representations. Two events (using abbreviated words for the six parameters) are declared with upper-case and lower-case words representing unique items (e.g. DUR-dur). The events are considered to have successors and predecessors represented by the dotted lines in the diagrams. The triangles represent nodes of the sub-list(s), with the squares and circles representing principle event nodes and successor nodes respectively.

In the above working model, the Markov chain was functioning on the principle event node only, with the sub-list parameters not involved in the decision process. This created a specific type of network data structure, which in this case was intervallic. By allowing other parameters to reside in the principle node, and relegating the interval parameter to a sub-list, opened up the possibility of different types of variations. Two additional types of variations were added, timbral and rhythmic. In the timbral data structure representation, the timbre parameter resides in the principle node, with the interval parameter moving to the timbre sub-list: in short, the two parameters switch places. Likewise in the rhythmic data structure representation the interval and delay parameters change places. In both the timbral and rhythmic data structures the class parameter not only represented a harmonic-melodic intervallic distinction but really differentiated between a horizontal or vertical event, encompassing all parameters. Therefore, if the class parameter were removed from the principle node to a sup-list it would not be involved in the event comparison and thereby rendered useless.


Program Design


The program is coded in the C language and is separated into two primary modules. The implementor module consists of the actual network data structure and is called by the user module. Separation of this type is extremely practical for each module can remain independent of the other's details.


Network Data Structure


The network data structure is divided into three functions: create_network(), insert(), and run_variation(). The create_network() function creates the data structure header consisting of two pointers to the top and bottom of the principle node linked list. These two pointers are initially set to NULL. This function returns a pointer to the network data structure header, and must be called first by the user program before any other network function can be utilized.





                      Figure 17: Three Markov state representations


The actual building of the data structure happens in the insert() function. With each call to this function it parses the six parameters (describing one complete musical event) into the appropriate lists, based on the type of variation specified. The algorithm for this function is al follows:


1. Check new data for duplication on principle node list, if found go to 3.

2. Add new principle node to principle node list.

3. Add remaining data to appropriate sub-lists.

4. Update previous principle node's successor list to point to present principle node. Go to 1.


One useful representation for gauging algorithmic run time is known as O-notation[11] (sometimes referred to as 'big-oh'), and depicted by O (f (n )), with O symbolizing run time proportional to and n standing for the data size. So O (n ) represents a run time proportional to n , while O (n2 ) represents a run time proportional to n2 .


With n equal to the number of input events, the complete insert() operation runs at O (n2 ), because the principle node list must be traversed each time to check for duplicates. This could be improved by storing the principle nodes in a binary search tree[12, p. 99] that would reduce the total run time to O (n log2 n ).


The function run_variation() generates the complete variation data. The user module makes one call to this function, and based on command line arguments creates the appropriate type of variation, length, starting point, etc. This function has a total run time of O (n xm ) (n = output data size, m = size of longest successor list). In the present design, each successor list is represented by a linked list. After the first item is output, one traversal operation of the necessary linked list is required for each succeeding item output. This is a costly design as represented by the "m" in the preceding run time analysis. The linked lists could be replaced by arrays utilizing indexes, thereby removing the costly traversal step and result in an improved run time of O (n ). The sub-list construct also uses linked lists that could be changed to arrays. The algorithm is as follows:

1. Traverse principle node list to find requested starting point. If found go to 2. If not found, exit.

2. Using random numbers as list indexes, traverse sub-lists for remaining parameters data.

3. If principle node successor list = NULL, exit.

4. Run limit test on intervallic parameter:

(a) If intervallic parameter is on principle node: use random number as index to next principle node via successor list. If new I interval satisfies limit, go to 5, else repeat procedure. If continues attempts to satisfy limit fail, exit.

5. If principle node is intervallic, go to new node found in 4(a), else use random number as index to next principle node via successor list.

6. Output data. Go to 2.

Occasionally in early variation experiments, several large intervals of the Theme would group together and exceed either lower or upper registers of the audio spectrum. It became necessary to impose registral boundaries to correct this. These boundaries were only needed on the intervallic parameter, and in the above algorithm this is called a limit test. Both lower and upper boundaries could be requested in the command line arguments. The limit test worked by checking to see if the new value chosen would exceed the limit, if so another value was picked. It was possible that no new value would satisfy the limit requirements, and if this occurred run_variation() would exit early.


It was also possible to encounter a principle node with no successor list. This occurred when the last line of input data was a unique event, which required a new principle node to be built. This new node would therefore have no follower and if reached in run_variation(), would require an early exit. This check is represented in the above algorithm by statement 3.

User Module

This module is the interface between the user and the network data structure. It interprets the command line arguments, opens and closes the necessary files, and performs the input routine. Data input and output are kept in ASCII files. At present a somewhat limiting data format must be observed that requires each musical event to be described by five floating point numbers and one character. The format should be arranged like the example in Figure 15. A more flexible input routine should accept different syntaxers, such as dm7 (descending minor seventh) instead of just -10, or flute rather than a numerical representative. With a better input routine it could read directly cmusic note statements instead of an intermediary set of statements. My motivation was one of practicality and economics, however a better calling routine will be implemented in the future.

Additional Tools and Musical Applications

Note Writing Software


The interface between the instrument definitions and network data are cmusic note lists. The vehicle for converting network output into note lists is a program that I designed, called notewrite. Below is an example of output from notewrite with its input provided from the network data format example given in Figure 14.


The note lists were kept in unique files representing individual voices of each variation. The interval parameter defines the distance to the next pitch, so the first pitch must be declared. In this example, the Bb (466.163762Hz) was declared on the command line of notewrite and the remainder of the pitches were determined by the interval parameter. The delay parameter functions the same way sequencing the starting time of the next note., So in the first line the start time (delay ) is always 0. The remaining parameters control the current note.


note 0.000000 ins4 0.576923 -12dB 466.163762Hz; {0/12}

note 0.562500 ins4 0.576923 -12dB 440.000000Hz; {-1/12}

note 1.125000 ins4 0.200000 - 4dB 554.365262Hz; {4/12}

note 1.312500 ins4 0.576923 -12dB 261.625565Hz; {-13/12}

note 1.875000 ins8 1.125000 -12dB 349.228231Hz; {5/12}

note 1.875000 ins7 1.125000 -12dB 293.664768Hz; {-3/12}

note 1.875000 ins7 1.125000 -12dB 277.182631Hz; {-1/12}

note 2.250000 ins4 0.428571 - 8dB 391.995436Hz; {6/12}

note 2.625000 ins8 0.750000 - 6dB 659.255114Hz; {9/12}

note 2.812500 ins8 0.750000 - 6dB 622.253967Hz; {-1/12}

Rules To Play By


Once all the tools were designed, how were they to be used? A better question is how is network used, since it really controls what instrument is to sound. The controlling tool then is network. The first step necessary was to supply network with an input. The Theme was written with network in mind and, for the curious, written the old-fashioned way with no computer assistance. The Theme was conceived as two voices, to be played in separate speakers (stereo). Each voice was converted to data separately and not as a whole with both voices. This meant the vertical relationship between the voices was theoretically not retained. The Theme is found in Appendix A, and is subdivided into three musical sections of A, B , and C. The top voice is termed voice 1, and the bottom voice 2. So A2 would represent the A section of voice 2. Also notice in the score, only the snare and wood instruments are specified. The remaining instruments were specified as the score was transcribed to data representation. The amplitude values were also determined during the transcription.


There were seven rules guiding the composition of the variations.

1. Network can be applied to all or part of each voice of the Theme .

2. Network can be applied to all or part of each voice of a variation.

3. Single voice variation material can be linked together to form longer material.

4. Completed single voice variation material can be transposed.

5. Tempo adjustments were permitted to variation material.

6. Each variation must contain two voices, with alignment of the voices controlled by the user.

7. All variations must contain material only derived from network and bounded by the above rules.


The above rules imply that except for the Theme, no additional "hand" composed material may be used, and the variations cannot contain direct excerpts from the Theme unless derived through network .


Node Count and Node Ratio


The entire piece contains five variations, with the shortest lasting about 30 seconds, and longest just over 2 minutes. Each variation was derived in a different manner using the seven rules. One important statistical piece of information kept by network is the node count or node ratio. In the network data structure, each unique piece of data requires a new principle node to be built, whereas each similar piece of data would require a new successor node to be built below the original principle node. Conceptually then, fewer principle nodes would increase the number of branching successor nodes and increase the possibility of new sequences being born. The node count then is a count of principle nodes and the node ratio is the ratio of data items (number of notes input) to principle nodes. If the ratio equals one this means for each note input, a principle node was created (every data item was unique), and the output would equal the input. The higher the ratio value means means there are less principle nodes and increases the probability of a variation less related to its input.



                                Figure 18: Theme Node Ratios of different variation types


In this piece, using the same data with different variation types results in different node ratios. In Figure 18 is a chart of the node ratios of the two voices of the Theme with the different variation types. As might be expected the timbral variation has a higher node ratio. This is because there are fewer instruments than interval values, hence more duplication. So with this information and quite often the case, the timbral variations were less related to their input than a respective interval variation. In between the interval and timbral node ratios is the rhythmic node ratio. Again this is predictable because there are fewer unique rhythmic values than interval values, yet more than the timbral values. This information was very valuable in gauging transformation. In Figure 19 are additional node statistics of Theme sub-sections.


Additional Information


One other important control over variation output was the limit values. Two values, low and high, determined the respective distance the pitches could travel from a starting position. This was useful in controlling register ranges for variation material. More important is that it kept the range within the audio spectrum which is the principle reason for its use. Throughout the piece the limits were set with middle C as the starting position, and low and high values to correspond with the respective ends of the piano keyboard.


                         Figure 19: Theme sub-section node ratios of different variation types


Individual Variations: How They Were Derived


The following explanations are place in chronological order of variations created. They are ordered this way because some variations are based on preceding variations. The performance order of variations is based on musical balance rather than chronological order of creation. The performance order of Theme and Variations is: Theme, ABC12iv, Partab, Piece12, Sect12, and A1t2iv . The variation names themselves evolved out of bookkeeping and identifying purposes rather than aesthetics.



This was the first variation created and uses the most elementary technique of network composition. The intervallic network was applied to ABC1 (voice 1 of the Theme), and output 200 events to supply voice 1 of this variation. The intervallic network was also applied to ABC2 and output 210 events for voice 2 of this variation. Voice 1 had a starting pitch of Bb4 and voice 2 began on E2. For alignment, the start of voice 2 was delayed four seconds after the start of voice 1.


One important factor in determining alignment of voices as well as transposition and tempo adjustment of single voices was the "non-realtime" nature of the CARL system. The best strategy in computing each variation was to compute each voice separately, and then digitally mix the voices, which required another computation. This meant to realize each variation required a minimum of three computations. To experiment with different alignments would require a new mix computation that typically took between 30 minutes and an hour (actual time), depending on the length of the variation and the computer's load. Individual voice computation was time consuming and ranged between 18 and 24 hours, again depending upon the length of the voice and the computer's load. Transposition and tempo adjustment would require a new computation for that voice plus the mix computation. The above times were based on the sampling rate of 48K, whereas using a sampling rate of 16K would run approximately three times faster. In short then, experimentation was kept at a minimum because of the additional computation time.



For this variation, input was limited to only the "A" section of the Theme. Different variation types were also combined in this variation. Voice 1 was obtained by applying the timbral network to A1 with an output of 200 events. The intervallic network was applied to A2 and output 260 events for voice 2. Notice the node ratios in Figure 19 for gauging intervallic and timbral transformation possibilities. Like variation ABC12iv, this one had the same starting pitches for voices 1 and 2 of Bb4 and E2 respectively. Voice 2 begins 7.4 seconds after the beginning of voice 1.



This variation involves a new technique, termed windowing, that is an extension of the preceding variation's use of a limited input. As a matter of fact, this variation uses a small section of the preceding variation as its input, so this is the first application of rule 2. From voice 1 of the preceding variation (A1t), events 153-170 were copied to form source file "s1". From the same voice, events 153-188 were also copied to create a second source file, "s2". Windowing means using small sections (or windows) of an input for source material with adjacent windows containing some similar events. In the above source files, s1 is a subset of s2, or a smaller window inside a larger one. If you were to alternate applying network to s1 and s2 several times and string the output together, the result would favor s1 yet have instances of the unique events of s2 (154-188).


Both voices in this variation were derived from s1 and s2, and give it a more noticeable coherence between the voices than preceding variations. The intervallic network was applied to s1 twice, using different random seeds, giving two unique outputs terment "s1iv" and "s1iv2". The intervallic network was applied to s2 generating "s2iv". A timbral network was applied to s2 generating "s2tv". The durational and delay parameters of s1iv2 were time expanded by 1.5 (tempo adjusted) to form "s1iv2te". Likewise, s2tv was time expanded by 1.5 to form "s2tvte". Voice 1 was the combination of s1iv and s2tvtc. Voice 2 was comprised of s1iv2te and s2iv. Both voices started at the same time with the same starting pitch of middle C.



This variation is my personal favorite and like Sect12, uses A1t2iv as its input material. This variation was designed to be chordal and slower moving than the others, and to accomplish the appropriate source material from A1t2iv was extracted. From A1t , events 46-59 were copied to form source file "z1". From the second voice of A1t2iv , events 35-53 were copied to form file "z2" and events 251-260 for file "z3". Over each source file, several intervallic and timbral types of network were run, creating many individual variation files. Each voice was comprised of stringing together random combinations of these variation files. Both voices started together, with voice 1 beginning on C4 and voice 2 on F4.108



Like partab, each voice of this variation was made by stringing together random combinations of variation material. The input material was not as limiting as Partab's though and gives this variation a more amorphous quality. This variation is also the only one to use the rhythmic network. The Theme was returned to for source material, utilizing subsections A1, B1, A2, and B2. With each source file, several intervallic and rhythmic types of network were run, providing new variation material, and were randomly combined to form the two voices. Voice 1 starts three-tenths of a second later than voice 2, and begins on D4. Voice 2 begins on middle C.



One criterion in rating an algorithmnic compositional system is the ratio of success to failure. Indeed for the five successful variations produced, there were two failures. This piece represents the first application of network, and just as important, the first use of the seven rules governing the composition of the variations. Out of curiosity, I experimented freely throughout the creation of the piece, not only to generate music but to discover the system's limitations. Equally as important as network was the effect the seven rules had on the variations. My curiosity to determine some of the system's boundaries, with respect to the seven rules, led to the two failures. In retrospect, the seven rules are extremely liberal.


One of the more interesting discoveries in using network was its effect on structure. Since it was applied to each voice separately, you would expect it to retain structural characteristics for a single voice output, as indeed it did, and that theoretically it should not retain structural relations between voices. Amazingly, it did retain a structural cohesion between voices in each variation. More often than not in each variation a contrapuntal or conversational interaction was present between the voices, suggesting an intelligence at work as opposed to an amorphous, independent, or unconcerned quality between the voices. One important determinant of the effect is the source material itself (the Theme). This effect would probably not occur in a more classical or hierarchical environment. It would be interesting to see if results retain a similar cohesion in a serial environment. One hypothetical solution to this structural cohesion mystery, is although a random choice is used in the decision process, the network data structure maintains a time history of events. The time history is kept in the successor nodes, such that each node that is appended to the list represents an event that occurred chronologically later than its predecessor.


Backus, John. The Acoustical Foundations of Music. W.W. Norton, New York, 1977.


Knuth, Donald E. The Art of Computer Programming; Vol. 1: Fundamental Algorithms Addison- Wesley, Reading Massachusetts, 1973.


Kochan, Stephen G. Programming in C. Hayden Book Company, Rochelle Park, New Jersey, 1983.]


Kernighan, Brian W. and Ritchie, Dennis M. The C Programming Language. Prentice-Hall, Englewood Cliffs, New Jersey, 1978.


Moore, F. Richard. Programming in C with a Bit of Unix. Prentice-Hall, Englewood Cliffs, New Jersey, 1985.


Standish, Thomas A. Data Structure Techniques. Addison-Wesley, Reading, Massachusetts, 1980.


Wirth, Niklaus. Algorithms + Data Structures = Programs. Prentice-Hall, Englewood Cliffs, New Jersey, 1976.

Appendix A: Theme

































































[1] F. Richard Moore. An introduction to the mathematics of digital signal processing; Part II: Sampling, transforms, and digital filtering. Computer Music Journal, 2(2):38-60, 1978.

[2] F. Richard Moore. The cmusic sound synthesis program. In CARL Startup Kit, Center for Music Experiment, 1985.

[3] Kevin Karplus and Alex Strong. Digital synthesis of plucked-string and drum timbres. Computer Music Journal, 7(2):56-69, 1983.

[4] David Jaffe and Julius Smith. Extensions of the Karplus-Strong plucked string algorithm. Computer Music Journal, 7(2):56-69, 1983.

[5] Curtis Roads. A tutorial on non-linear distortion or waveshaping syntheses. Computer Music Journal, 3(2):29-34, 1979.

[6] James Moorer. Signal processing aspects of computer music--a survey. Computer Music Journal, 1(1), 1977.

[7] See Niklaus Wirth Algorithms + Data Structures = Programs. Prentice-Hall, Englewood Cliffs, New Jersey, 1976, p.154.

[8] John Chowning. The synthesis of complex audio spectra by means of frequency modulation. Journal of the Audio Engineering Society, 21(7):526-534, 1973.

[9] Lejaren A. Hiller and L.M. Isaacson. Experimental Music-Composition with an Electronic Computer. McGraw-Hill, New York, 1959.

[10] 10. Claude E. Shannon and Warren Weaver. The Mathematical Theory of Communication. University of Illinois Pres, Urbana, Chicago, and London, 1949.

[11] Donald E. Knuth. The Art of Computer Programming; Vol. 1: Fundamental Algorithms. Addison-Wesley, Reading, Massachusetts, 1973.