Could I easily and algorithmically make recordings more monotone?
The answer turns out to be “sorta”. I’ve written a simple python script that uses the Remix library to do some fine-grained pitch-shifting to the songs I throw at it, and the results vary widely from song to song for a variety of reasons I’ll explain below with some examples. If you want to skip the nerd talk and jump straight to the weirded-up music, see the bottom of the post for a collection of songs I ran through the script.
Here’s an example of a song that produces very clear results: Pink Floyd, Comfortably Numb
Note the that song seems to pretty much hang on a single chord throughout (or more specifically seems to idle back and forth between a major and a minor version of the same chord), while the melody and the guitar solos seems to hop around into weird jarring normal-then-chipmunks-then-normal territory. I’ll talk in more detail about this in a bit, but the key thing happening here is that the script is shifting basically every chord in the song to the tonic, the root key of the song. So instead of a chord sequence like Bm -> A -> G -> Bm -> D we end up hearing Bm -> B -> B -> Bm -> B.
Note also just how flat the song becomes in the process. Instead of big satisfying chord changes and nice minor-into-major catharsis when the verse breaks into the chorus, it’s all just…samey. This is actually something like an ideal outcome for the monotoning process, if not an ideal way to produce engaging recordings.
My friend Roj suggested a good alternate name for this process: “boringization”. When the process works well, it strips out most of the musical movement in a song, producing something far more static, at best sort of hypnotic and at worst simple dull and uninvolving.
One Note Pony
Here’s a far simpler example: Suzanne Vega, Tom’s Diner
With an a capella track, the job of the script is very simple: figure out what note the singer is singing, and pitch shift that to whatever the tonic of the song is supposed to be. The result is Suzanne Vega singing one note over and over again. (Or nearly; the process’s pitch detection is very simple and imperfect and so it can for various reasons be fooled a bit, so we get moments where she’s shifted to the wrong note.)
And while the note remains the same, the quality of her voice changes significantly from note to note. This is a result of the pitch-shifting being done—when you take a complex sound like the human voice and shift all the frequencies up or down significantly, the result will generally not sound right because the distribution of frequencies in your pitch-shifted sample is different from the frequencies present in that person’s voice when they’re actually singing that lower or higher note. In other words, the timbre, roughly the frequency qualities of the voice, are changed.
This is why voices sound funny when music is played back at the wrong speed on an analog music player (e.g cassette deck or vinyl turntable), hence the Alvin and the Chipmunks or Boomy Satan effects everybody is familiar with regardless of whether they’re an acoustics nerd. (The effect of inhaled helium or xenon on the human voice is not unrelated—the differing density of those gases induces a timbral change in sound created while it’s being exhaled through your larynx. But that’s a different discussion.)
Desperately Seeking Tonic
So what is actually going on with the monotoning process?
Every song, before my own code does anything at all, gets put through a process (the Echo Nest “Analyze” function) that looks in close detail at the composition of a recording. That process does a lot of things, but only three are particularly important for what my script is trying to accomplish. Those three things are:
1. Cutting the song up into very small pieces called segments, usually some fraction of a second long each.
2. Analyzing the frequency content of each of those segments to produce a list of twelve pitch values that correspond to the twelve notes on the familiar western chromatic scale (e.g. the twelve black and white keys that make up an octave on a piano keyboard, from A to G# by half-steps or semitones), with a larger number for any given pitch value meaning that more of the frequencies that correspond to that note are present.
3. Analyzing the whole song to make a guess what its tonic note is (that is, what key is it in, e.g. C or F or G# minor).
What my script does is take that tonic note from (3) and treat that as the note that we want to rule over the whole song. Then, for every single little segment of the song, in order, the script looks through that segment’s array of twelve pitch values and finds the one that’s the biggest — roughly, the note on the scale that is loudest for that little bit of the song. And it compares that note to the tonic note. If it’s they’re the same, nothing happens.
If the tonic and the detected pitch are different, the current segment gets pitch shifted by a number of semitones equal to the distance between the detected pitch and the tonic.
So, coming back to Comfortably Numb as an example, if the detected tonic of our song is Bm (the first chord of the song and the one it keeps coming back to), then it won’t shift the the first bit of the verse (“Hello / Is there anybody”) at all; but when the verse changes from a Bm chord to an A chord (“In there / just nod if you can”), the script notices that instead of a big fat B being played by the bass, it’s now a big fat A. A is two semitones away from B (A, A#, B), and so the script shifts those segments in the A chord portion of the verse up two steps. Et voila, instead of the chords going Bm -> A we end up with Bm -> B.
Likewise, with the Tom’s Diner example we saw (pretty much) every single note of the song shifted to the tonic note.
Fighting For Attention
This is complicated, though: with the exception of very simple cases like a solo a capella performance, songs tend to have a lot of notes happening at any given time. You’ll usually have, at a bare minimum, some major or minor chord triad playing on keyboards or guitars or other polyphonic rhythm instruments, a bass line playing something that may or may not stick to the root of the current chord, and a vocal melody that may (in fact usually will) stray constantly from the root of the current chord. Mix in stuff like lead guitar and vocal harmonies and things can get very sonically complicated.
So how does the script deal with this?
It doesn’t, in any clever way. It’s about as stupid a script as you could hope for; it does it’s one job (“Fetch the loudest pitch of the current segment, boy! Attascript! Who’s a good script! Who’s a gooood script!”) and doesn’t care what part of the mix is responsible for making that dominating bit of noise.
Which means that what happens to a song when it goes through the monotoning process depends largely on how that song was recorded and mixed in the first place.
There are three general outcomes that I’ve seen in my initial experiments:
1. Dominating bass/rhythm mix. Comfortably Numb is a great example of this: the bassline is steady and the rhythm guitar and strings playing the chords are as well, and they’re mixed loud enough that in any given segment they tend to be producing a big fat tonic note that keeps the chords really recognizable. The script ends up keeping the whole song very single-chord as a result, and the vocals sort of jump all over the place as they get shifted around.
2. Dominating vocal line. Tom’s Diner is a trivial example of this, since it’s all vocal, but any song where the vox are mixed relatively loudly compared to the rest of the instrumentation there’s a good chance you’ll see a similar effect: the vocals stay on pretty much one note (or two notes, an octave apart, jumping back and forth, for stupid-pitch-shift-calculation-algorithm reasons), and the rest of the song will flail around in the background with very weird chord changes.
A good example of that happening in a full mix is the Crash Test Dummies‘ mid-90s earworm Mmm Mmm Mmm Mmm, in which Brad Robert’s seismic lead vocals are present enough in the mix to catch the script’s attention most of the time, yielding a very weird and far less catchy rendition in which he seems to be trying to destroy some architecture through meditative humming:
3. Total madness. If neither (1) or (2) holds, that generally means there’s no one characteristically dominating sonic element in the mix, and so the script ends up following different pieces of the mix from moment to moment. Here the bassline, but then the lead vocal, but then the solo guitar line, but then the synthy thing that swelled up a bit, and so on. Songs along these lines tend to be pure weird chaos to listen to, which is great if you’re into that sort of thing (hi!) but makes for less compellingly clear “look at this specific thing that’s happening” examples.
See for example The Shangri-las, Leader of the Pack, which goes from charming Spector-era pop tragedy to horrible horrible nightmare music:
Of the several dozen songs I tried, the collection I’ve posted below mostly fall toward types 1 and 2, though there’s a couple that can’t seem to make up their mind and so are either sort of 1/2 hybrids that go back and forth or outright type 3 messes.
These Are A Few Of My Favorite Monotothings
(love the monotone guitar riff)
(monotone bluegrass is actually plausible)
(another a capella example)
(classic rock = type 1?)
Notes For Improvement
I’ve got a few ideas in mind for how to improve/expand this script.
One obvious problem with it is that it’s very twitchy: even if when doing a pretty good job of creating a steady monotone feel (whether vocally or in terms of bass/chord lines), it’ll have little moments where it freaks out because of some transient element in the music and leap to a way different note for half a second. If I were to create a smoothing function that looked for areas that were steady and ignored brief variations between those steadier bits, that’d probably help along with glitch reduction.
Another issue, harder to deal with automatically, is controlling the balance of pitchshifting up vs. pitchshifting down. A couple things come into this:
First, some songs have the melody mostly move around in the notes between the tonic and the fifth of the scale, whereas other songs have a melody that dives down below the tonic a bit. Choosing to shift up or down is the difference between Satan Voice and Chipmunk Voice, and bigger shifts have more blatant effects. So it’d be good to try and pick a balance point where neither the up nor the down of the vocal is any larger than it has to be. Striking that balance point would depend on analyzing the melody, which is pretty easy to do with a human brain and a quick listen but is harder to do with my narrow existing programmatic toolset.
Secondly the pitch info available for a segment is only a list of twelve notes on an abstract scale — you can find out what named notes are present, essentially, C and F# and A, but you don’t know what octaves those are in. Which means that knowing whether the current note is “below” or “above” another detected note in a different segment is impossible at a glance, which can lead to situations where instead of ideally pitching G4 (i.e. a G note in the fourth octave) up two semitones to A4 and B4 down to semitones to A4 (with minimal timbral whackery), the script can end up shifting G4 down ten semitones to A3 (hello, Satan!) or B4 up ten semitones to A5 (Alvin, Theodore, et al).
There may be good ways to deal with that stuff more elegantly. I haven’t tried yet.
I also wonder if I can use the segment timbre information that Echo Nest provides to make some sort of educated guesses about what kind of sound or sounds are responsible for generating the current dominating pitch. It might be possible to at least partially differentiate between e.g. round bass notes vs. sharper vocal or guitar notes and do some intentional targeting of one part of the recorded mix vs. others for what the script pays attention to.
Another possible approach to this whole idea would be to look not just at the single loudest pitch for every segment but at all the pitches present, and try to identify chords based on the three or four most present pitches in the segment. If you’ve got proportionally a lot of C, E, and G standing out in a segment, it’s a good guess that you’re dealing with a C major chord, whatever else might be going on. Combining this sort of chord detection with the glitch-smoothing idea above might make it possible to analytically ignore parts of the mix and really just focus on the chords underlying a song (to pull off a stronger type-1 It’s All One Chord effect) or to attend to the vocal line when it seems to stray from the chord roots (to better achieve a type-2 The Singer Only Knows One Note thing).
That last idea might also make for a way to build a rough chord-analyzer system, which would be a boon to anybody who needed the chords for an mp3 and couldn’t find nice simple tabs for it anywhere.
Anyway, I’ve had a lot of fun with this so far. If you have specific ideas for how to approve or differently approach this idea, or have a song in mind that you think might producing interesting monotonization output, let me know.