Metaspectra: Sums of imperfect parts
Justin Wright
May 7, 2022
Spectral music demands no shortage of humility from its components. Each instrument is asked to surrender its individuality and is demoted to a partial, a mere sine tone within a broader waveform. This is, of course, an unfair demand, because instruments are not sine tones; they are complex individuals with personalities that are begging for attention, with overtones that are often more pronounced than their fundamental, and harmonic spectra that can extend well beyond the range of human hearing. We must therefore consider not only the relative volume of each partial, but the aggregate spectrum that results from each of the combined instruments, which I will refer to as a metaspectrum. This paper will examine the role of the components’ timbre in the mimicry of another instrument’s harmonic spectrum, using a handful of model instruments and waveforms to determine whether it has a significant effect. If it does, how can we alter our metaspectra to better mimic our desired timbre, or better yet, can we exploit these supposed imperfections as a compositional tool?
For the sake of brevity, I chose two target instruments (instruments whose harmonic spectra I would try to replicate) with distinct and contrasting timbres. The clarinet (Fig. 1) is notable for having odd-numbered partials that are vastly stronger than its even-numbered partials. This presents an interesting challenge when we attempt to reconstruct its spectrum: is it even possible for a source instrument (a building block in the reconstruction of a target instrument’s spectrum) with a strong second partial to reproduce this even-odd disparity? The clarinet also has a very dark timbre, meaning that its overtones taper off relatively quickly as we get into higher frequencies, which presents another question: what happens to the high frequency overtones when bright instruments mimic a dark instrument?
The trumpet (Fig. 2), on the other hand, is brash, nasal, and bright, so it struck me as the perfect contrasting timbre to target. One of its most notable qualities is its nearly inaudible fundamental, with its overtones increasing in volume incrementally until approximately the fifth partial, when it begins a gradual and near-linear (albeit in dB, a logarithmic unit) decrease. The upper harmonics extend past the 20 kHz, the upper limit of human hearing[1], making the trumpet a very bright instrument. These characteristics, at first glance, suggest an optimal target instrument, since its initial upward slope on the spectrograph should make it more friendly to the overlapping peaks of its source instruments, and its full upper frequency range may forgive some accumulating overtones in that register.
My source instruments and oscillators, similarly, where chosen for their contrasting features. A sine tone (Fig. 3) represents the platonic ideal of a source instrument, because, at least mathematically, it is the simplest waveform to which any sound can be broken down using Fourier expansion[2]. A perfect square wave (Fig. 4) (which we will abbreviate as SqW), on the other hand, would be expanded into an infinite number of sine waves[3], and therefore serves as our model of an extremely overtone-rich synthetic tone.
Flute (Fig. 5) was chosen as our instrument analog of the sine tone, since it has a relatively strong fundamental, and a straightforward overtone series that quickly diminishes. The violin (Fig. 6) family was chosen as a very “impure” set of source instruments, but also as a source whose spectrum may be malleable through various playing techniques. While the spectrum of each note and string can vary significantly, many notes feature a fundamental that is quieter than its lower-digit partials, and a decrease in volume that only begins in the double-digit partials, giving it a very bright timbre. The spectra of harmonic notes (Fig. 7), however, is completely different, with a very strong fundamental, and overtones that very quickly diminish. A list of the audio sources used can be found in Appendix 1.
Let the construction begin
To create the basic reconstructions, I began by analyzing the target timbres in the application SPEAR (Sinusoidal Partial Editing Analysis and Resynthesis)[3], which created sine-wave renditions of the audio files. This allowed me to clean up the spectrum to remove superfluous data and barely audible overtones while ensuring that the timbre was preserved. In the interest of brevity, I will relegate the nitty-gritty details to Appendix 2. Using the spectral analysis application Raven Lite[4], I determined the amplitude of each partial that I had deemed necessary based on my earlier SPEAR analysis. Finally, in Ableton Live[5], I created groups of tracks of either synthesizers, sampled audio, or sampler instruments, depending on the instrument used, tuned each partial microtonally to the harmonic series, and set the amplitude of each track to reflect its respective partial. To ensure that our comparison focused on timbre rather than other characteristics of a note, an envelope follower was then used on the reconstructions to replicate the attack, decay, and sustain of the target instruments.
The sine wave reconstructions of both the clarinet (Fig. 8b) and trumpet (Fig. 9b) were quite convincing. This comes as no surprise, because these reconstructions are simply replicating what had already been done in SPEAR. It does, however, reassure us that the levels set in Ableton Live are, in ideal circumstances, sufficient to reproduce their target timbres. What happens, now, if we use the overtone-rich square waves in these models instead of the pure sine waves? As our audio sample demonstrates, the outcome bears little resemblance to the target instrument. Both the clarinet (Fig. 8c) and trumpet (Fig. 9c) constructions retain little more than a faint ghost of their target instruments. As their metaspectra show, the lowest partials are actually quite faithful to their target levels, but the rest of the spectrum is saturated with very strong overtones extending past the human hearing range.
I then created an intermediate source timbre by placing low-pass filters on each of the square waves. This retained some of the overtones, but not so many as to fully saturate the spectrum. The result (Figs. 10 and 11) was only marginally less faithful to the target sound than the pure sine wave reconstruction, suggesting that there is at least some tolerance for overtones if the fundamental is sufficiently established.
There is a clear lesson that can be drawn from these very basic reconstructions using synthesizers: when mimicking a sound, be as much like a sine tone as you can be. It is clear that rich overtones in a source instrument can significantly stultify the potential of its new role as a partial. From this we can predict that darker instruments, with more prominent fundamentals and less information in the upper frequencies, are the best candidates for source instruments.
Three violins in a trenchcoat
Let us now see whether our predictions hold true with real instruments, or at least as real as we can get with my current limitations. Preparing instrument models came with its own challenges, because the loudness of a note can vary significantly within a single instrument. To normalize the volume of a note, especially when we are preparing to use it as a partial, do we adjust based on the volume of its fundamental, its perceived loudness, or the additive volume of its whole spectrum? In the end, I circumvented the question in the few cases in which it was necessary by making alternative constructions made from going back and forth between mixing the partials and checking the metaspectrum until I was able to get the spectrogram looking as much like the target timbre as I could.
Can ten flutes sound like a clarinet? My construction would suggest otherwise. Although the metaspectrum (Fig. 12) shares some similarities to the clarinet’s spectrum, even after entirely leaving out the second and fourth partials, the even-odd disparity was simply impossible to recreate due to the strong second partial of the flute. Perhaps the clarinet just isn’t an instrument that is asking to be mimicked. The trumpet (Fig. 13), however, was recreated surprisingly well – not perfectly though, seemingly because of the excess of high frequency information. It seems that an ideal target instrument has steady slopes in the amplitudes of its partials, and an abundance of higher frequency overtones, because they will be there, invited or not.
Strings, unsurprisingly, did a terrible job of mimicking both clarinet (Fig. 14) and trumpet (Fig. 15). Although the two models are distinguishable from one another, ultimately, they both just end up sounding like strings. This is likely a combination of the clarinet’s difficulty in being replicated, and the bright, shimmering overtones of string instruments that seem to overpower anything else. If we recall Figure 7, however, we are reminded that string instruments have another trick up their sleeves that bears much more resemblance to a pure sine tone: harmonics. To my ears, string harmonics matched the flute in its ability to mimic both the clarinet (Fig. 16) and the trumpet (Fig. 17). Sure, the odd-partial pattern is still absent from the clarinet model’s metaspectrum, and the trumpet model is still far too bright, but there was something distinctly identifiable about the outcomes, and for that reason I would consider the string harmonics to a modest success.
There are many caveats to consider with these digital reconstructions, some working in our models’ favor, and some working against. In Ableton Live I was able to precisely tune each partial to a perhaps unrealistic degree that could not be expected of any performer. Whether a bit less precision would ruin the effect is a whole other paper. There are some factors, however, that could make a real-world construction more convincing. I was limited to whatever freely available instrument samples I could get my hands on. Some of these needed significant tuning, and others needed their levels adjusted. Most significant of all, however, is that the samples were all close-miked, and anyone who has played around with mic placement in a studio knows that an instrument up close and an instrument on stage can have a vastly different spectrum, with distant instruments tending to have much more attenuated higher frequencies. This may explain why none of my instrument models were particularly convincing. In defense of my close-miked samples, however, their uncharitable nature allowed us to more easily emphasize the timbral differences between the instruments in question.
The moral of the story
Aside from choosing instruments or playing techniques that are as close as possible to a sine wave, there are some lessons in orchestration can we draw from our various reconstructions. Most importantly, we must always consider how a source instrument’s timbre will affect the outcome in its metaspectrum. All harmonics are additive, so any redundancy may result in certain partials being overemphasized. This is especially true for octaves of the fundamental, which appear more frequently than any other partials, and line up more easily by being perfect intervals. This, however, could be advantageous when working with smaller ensembles such as string quartets, because the redundancy may allow us to omit certain partials entirely, such as the second and fourth partial when mimicking a clarinet.
Speaking of the clarinet, it may be fruitful to seek out any instruments that have more idiosyncratic patterns of partials. When I was struggling to create the clarinet’s overtone series, with its attenuated even partials, I kept thinking, “I know what instrument could create this pattern – a clarinet!” before remembering that it was the very instrument I was trying to recreate. It is a blessing, however, to have instruments in our roster that, if chosen in the right combination, might be able to skip certain attenuated partials and allow us to create a more detailed and convincing reconstruction.
Lastly, it is much easier to add high frequency data than it is to subtract it. If a string quartet is trying to emulate the sound of a bright instrument, an effective method could be to have three of the instruments playing the lower partials with harmonics, but have the highest instrument play its partial as a conventional note rather than a harmonic, or even ponticello, to bring out those higher frequencies.
Analyzing the cry of the elephant in the room
Even if we can create the most perfect emulation of a particular instrument, we must ask ourselves an important question: why? If we want an ensemble to sound like a violin, our easiest option would be to include a violin. Certain pieces that focus nearly entirely on replicating a harmonic spectrum, such as James Tenney’s Critical Band[7], can sometimes strike me more as proofs of concept than polished compositions, but I would argue that having the ability to convincingly reproduce one instrument’s timbre on other instruments can be a very valuable composition tool, provided that it is not our only tool. One could understandably see accurate timbral reconstructions as the music equivalent of hyperrealism in visual art and argue that the expressiveness of the work is lost for a gimmick.
I believe the best way to avoid this is to play with faithfulness as an element, rather than adhere to it strictly. Timbre is just one element of a note, separate from its envelope and pitch, and by exhibiting accuracy in certain elements while eschewing it in others, we can transform from hyperrealism to surrealism, or perhaps a trompe-l’oreille. In my recent sextet for string quartet and percussion, I experimented with this by beginning a note with a struck marimba, then having the string quartet take over the sustain of that note by mimicking its lower spectrum. The quartet then shifts the pitch to a new tone as a bowed vibraphone (whose spectrum is not that dissimilar to the marimba’s sustain) further perpetuates this new pitch to clear the floor for the cycle to begin again on a new note, with the old note slowly decaying under it. While the quartet’s replica of the spectrum was a crude four-partial construction, I believe that a more faithful spectral reconstruction would only have enhanced the expressiveness. The magic lay in the fact a normally short, percussive, and fixed-pitch note was suddenly being artificially sustained and pitch-bent, and this seamlessness was only made possible by spectral mimicry.
In future compositions, I am eager to explore morphing spectra, where one instrument transforms into another, or into a chord. Faithfulness to a target timbre could be seen as a parameter rather than a fixed value, and a beautiful synergistic construction could decay into a crude, out-of-tune approximation, or an enharmonic cloud. What is clear is that optimizing the potential of spectralism opens up new and exciting tools that can be employed when a piece calls for it, and discarded when it doesn’t.
References
Rosen, Stuart (2011). Signals and Systems for Speech and Hearing (2nd ed.). BRILL. p. 163.
Prestini, E (2004). The Evolution of Applied Harmonic Analysis: Models of the Real World. Birkhäuser. p. 56.
Allain, R. (2021, February 7). Representing a Square Wave With a Fourier Series and Python. Retrieved from Level Up Coding: https://levelup.gitconnected.com/representing-a-square-wave-with-a-fourier-series-and-python-6d43beb19442
Klingbeil, M (2021). SPEAR. Sinusoidal Partial Editing Analysis and Resynthesis (Version 0.8.2) [Computer software] https://www.klingbeil.com/spear
K. Lisa Yang Center for Conservation Bioacoustics (2021). Raven Lite (Version 2.0.3) [Computer software] https://ravensoundsoftware.com/software/raven-lite/
Ableton (2021). Ableton Live Suite (Version 11.0.11) [Computer software] https://www.ableton.com/
Tenney, J. (2000). Critical Band: 1988/2000. Lebanon, NH: Frog Peak Music, 2000s.
Appendix 1. Sources of audio used for analysis and constructions
Clarinet sample
Non-vibrato sample of Bb clarinet playing a C4 taken from The University of Iowa Musical Instrument Samples. https://theremin.music.uiowa.edu/MIS-Pitches-2012/MISBbClarinet2012.html
Trumpet sample
Non-vibrato sample of Bb trumpet playing a C4 taken from The University of Iowa Musical Instrument Samples. https://theremin.music.uiowa.edu/MIS-Pitches-2012/MISBbTrumpet2012.html
Sine wave
Created using Ableton Live Wavetable synthesizer. https://www.ableton.com/en/packs/wavetable/
Square wave
Created using Ableton Live Wavetable synthesizer. https://www.ableton.com/en/packs/wavetable/
Flute
Non-vibrato samples of flute C4 taken from The University of Iowa Musical Instrument Samples. https://theremin.music.uiowa.edu/MIS-Pitches-2012/MISFlute2012.html Tuned in Ableton Live.
Strings
Generated using Spitfire Audio LABS plugin in Ableton Live, using the “Long” preset from the “Strings” sample pack. Tuned using the “master tuning” parameter in LABS. https://labs.spitfireaudio.com/
String harmonics
Samples taken from Andrew Hugill’s website, The Orchestra: A Users Manual. https://andrewhugill.com/OrchestraManual/ with samples of cello and violin harmonics were taken from https://andrewhugill.com/OrchestraManual/cello_harmonics.html and https://andrewhugill.com/manuals/violin/harmonics.html respectively.
Appendix 2. Analysis of audio samples and construction of models
Samples were imported into SPEAR and analyzed with the default settings. Partials below a -50dB threshold were selected and deleted. In some cases, a -40dB threshold was used when enough superfluous or enharmonic partials remained.
The samples were then reimported into Raven Lite, and condensed to a single channel. A selection in the middle of the sample was made to focus the analysis on a sustained portion of the timbre. A spectrogram was then created, and the first ten peaks were measured using the measurement tool. These peaks were then used to set the levels of each partial in Ableton Live.
To create the timbral reconstruction models in Ableton Live, ten tracks of either audio or MIDI were created. Audio samples or plugins for each partial were placed in these separate tracks before being tuned with a precision of 1 cent. The volume for each track was set to the levels determined using Raven Lite.
These constructions were then exported as a wav file, and reanalyzed with Raven Lite to visualize the metaspectrum.
Finally, the exported constructions were further processed in Ableton Live using the Max for Live envelope follower plugin, which followed the envelope of the target audio sample and recreated it on the constructed model.