Skip to content

Monotoning with Echo Nest Remix

A question occurred to me a while back while exploring the Echo Nest Remix toolset (see previous NIN tomfoolery):

Could I easily and algorithmically make recordings more monotone?

The answer turns out to be “sorta”. I’ve written a simple python script that uses the Remix library to do some fine-grained pitch-shifting to the songs I throw at it, and the results vary widely from song to song for a variety of reasons I’ll explain below with some examples. If you want to skip the nerd talk and jump straight to the weirded-up music, see the bottom of the post for a collection of songs I ran through the script.

Boringization

Here’s an example of a song that produces very clear results: Pink Floyd, Comfortably Numb

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Note the that song seems to pretty much hang on a single chord throughout (or more specifically seems to idle back and forth between a major and a minor version of the same chord), while the melody and the guitar solos seems to hop around into weird jarring normal-then-chipmunks-then-normal territory. I’ll talk in more detail about this in a bit, but the key thing happening here is that the script is shifting basically every chord in the song to the tonic, the root key of the song. So instead of a chord sequence like Bm -> A -> G -> Bm -> D we end up hearing Bm -> B -> B -> Bm -> B.

Note also just how flat the song becomes in the process. Instead of big satisfying chord changes and nice minor-into-major catharsis when the verse breaks into the chorus, it’s all just…samey. This is actually something like an ideal outcome for the monotoning process, if not an ideal way to produce engaging recordings.

My friend Roj suggested a good alternate name for this process: “boringization”. When the process works well, it strips out most of the musical movement in a song, producing something far more static, at best sort of hypnotic and at worst simple dull and uninvolving.

One Note Pony

Here’s a far simpler example: Suzanne Vega, Tom’s Diner

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

With an a capella track, the job of the script is very simple: figure out what note the singer is singing, and pitch shift that to whatever the tonic of the song is supposed to be. The result is Suzanne Vega singing one note over and over again. (Or nearly; the process’s pitch detection is very simple and imperfect and so it can for various reasons be fooled a bit, so we get moments where she’s shifted to the wrong note.)

And while the note remains the same, the quality of her voice changes significantly from note to note. This is a result of the pitch-shifting being done—when you take a complex sound like the human voice and shift all the frequencies up or down significantly, the result will generally not sound right because the distribution of frequencies in your pitch-shifted sample is different from the frequencies present in that person’s voice when they’re actually singing that lower or higher note. In other words, the timbre, roughly the frequency qualities of the voice, are changed.

This is why voices sound funny when music is played back at the wrong speed on an analog music player (e.g cassette deck or vinyl turntable), hence the Alvin and the Chipmunks or Boomy Satan effects everybody is familiar with regardless of whether they’re an acoustics nerd. (The effect of inhaled helium or xenon on the human voice is not unrelated—the differing density of those gases induces a timbral change in sound created while it’s being exhaled through your larynx. But that’s a different discussion.)

Desperately Seeking Tonic

So what is actually going on with the monotoning process?

Every song, before my own code does anything at all, gets put through a process (the Echo Nest “Analyze” function) that looks in close detail at the composition of a recording. That process does a lot of things, but only three are particularly important for what my script is trying to accomplish. Those three things are:

1. Cutting the song up into very small pieces called segments, usually some fraction of a second long each.
2. Analyzing the frequency content of each of those segments to produce a list of twelve pitch values that correspond to the twelve notes on the familiar western chromatic scale (e.g. the twelve black and white keys that make up an octave on a piano keyboard, from A to G# by half-steps or semitones), with a larger number for any given pitch value meaning that more of the frequencies that correspond to that note are present.
3. Analyzing the whole song to make a guess what its tonic note is (that is, what key is it in, e.g. C or F or G# minor).

What my script does is take that tonic note from (3) and treat that as the note that we want to rule over the whole song. Then, for every single little segment of the song, in order, the script looks through that segment’s array of twelve pitch values and finds the one that’s the biggest — roughly, the note on the scale that is loudest for that little bit of the song. And it compares that note to the tonic note. If it’s they’re the same, nothing happens.

If the tonic and the detected pitch are different, the current segment gets pitch shifted by a number of semitones equal to the distance between the detected pitch and the tonic.

So, coming back to Comfortably Numb as an example, if the detected tonic of our song is Bm (the first chord of the song and the one it keeps coming back to), then it won’t shift the the first bit of the verse (“Hello / Is there anybody”) at all; but when the verse changes from a Bm chord to an A chord (“In there / just nod if you can”), the script notices that instead of a big fat B being played by the bass, it’s now a big fat A. A is two semitones away from B (A, A#, B), and so the script shifts those segments in the A chord portion of the verse up two steps. Et voila, instead of the chords going Bm -> A we end up with Bm -> B.

Likewise, with the Tom’s Diner example we saw (pretty much) every single note of the song shifted to the tonic note.

Fighting For Attention

This is complicated, though: with the exception of very simple cases like a solo a capella performance, songs tend to have a lot of notes happening at any given time. You’ll usually have, at a bare minimum, some major or minor chord triad playing on keyboards or guitars or other polyphonic rhythm instruments, a bass line playing something that may or may not stick to the root of the current chord, and a vocal melody that may (in fact usually will) stray constantly from the root of the current chord. Mix in stuff like lead guitar and vocal harmonies and things can get very sonically complicated.

So how does the script deal with this?

It doesn’t, in any clever way. It’s about as stupid a script as you could hope for; it does it’s one job (“Fetch the loudest pitch of the current segment, boy! Attascript! Who’s a good script! Who’s a gooood script!”) and doesn’t care what part of the mix is responsible for making that dominating bit of noise.

Which means that what happens to a song when it goes through the monotoning process depends largely on how that song was recorded and mixed in the first place.

There are three general outcomes that I’ve seen in my initial experiments:

1. Dominating bass/rhythm mix. Comfortably Numb is a great example of this: the bassline is steady and the rhythm guitar and strings playing the chords are as well, and they’re mixed loud enough that in any given segment they tend to be producing a big fat tonic note that keeps the chords really recognizable. The script ends up keeping the whole song very single-chord as a result, and the vocals sort of jump all over the place as they get shifted around.

2. Dominating vocal line. Tom’s Diner is a trivial example of this, since it’s all vocal, but any song where the vox are mixed relatively loudly compared to the rest of the instrumentation there’s a good chance you’ll see a similar effect: the vocals stay on pretty much one note (or two notes, an octave apart, jumping back and forth, for stupid-pitch-shift-calculation-algorithm reasons), and the rest of the song will flail around in the background with very weird chord changes.

A good example of that happening in a full mix is the Crash Test Dummies‘ mid-90s earworm Mmm Mmm Mmm Mmm, in which Brad Robert’s seismic lead vocals are present enough in the mix to catch the script’s attention most of the time, yielding a very weird and far less catchy rendition in which he seems to be trying to destroy some architecture through meditative humming:

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

3. Total madness. If neither (1) or (2) holds, that generally means there’s no one characteristically dominating sonic element in the mix, and so the script ends up following different pieces of the mix from moment to moment. Here the bassline, but then the lead vocal, but then the solo guitar line, but then the synthy thing that swelled up a bit, and so on. Songs along these lines tend to be pure weird chaos to listen to, which is great if you’re into that sort of thing (hi!) but makes for less compellingly clear “look at this specific thing that’s happening” examples.

See for example The Shangri-las, Leader of the Pack, which goes from charming Spector-era pop tragedy to horrible horrible nightmare music:

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Of the several dozen songs I tried, the collection I’ve posted below mostly fall toward types 1 and 2, though there’s a couple that can’t seem to make up their mind and so are either sort of 1/2 hybrids that go back and forth or outright type 3 messes.

These Are A Few Of My Favorite Monotothings

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

(love the monotone guitar riff)

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

(monotone bluegrass is actually plausible)

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

(another a capella example)

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

Audio clip: Adobe Flash Player (version 9 or above) is required to play this audio clip. Download the latest version here. You also need to have JavaScript enabled in your browser.

(classic rock = type 1?)

Notes For Improvement

I’ve got a few ideas in mind for how to improve/expand this script.

One obvious problem with it is that it’s very twitchy: even if when doing a pretty good job of creating a steady monotone feel (whether vocally or in terms of bass/chord lines), it’ll have little moments where it freaks out because of some transient element in the music and leap to a way different note for half a second. If I were to create a smoothing function that looked for areas that were steady and ignored brief variations between those steadier bits, that’d probably help along with glitch reduction.

Another issue, harder to deal with automatically, is controlling the balance of pitchshifting up vs. pitchshifting down. A couple things come into this:

First, some songs have the melody mostly move around in the notes between the tonic and the fifth of the scale, whereas other songs have a melody that dives down below the tonic a bit. Choosing to shift up or down is the difference between Satan Voice and Chipmunk Voice, and bigger shifts have more blatant effects. So it’d be good to try and pick a balance point where neither the up nor the down of the vocal is any larger than it has to be. Striking that balance point would depend on analyzing the melody, which is pretty easy to do with a human brain and a quick listen but is harder to do with my narrow existing programmatic toolset.

Secondly the pitch info available for a segment is only a list of twelve notes on an abstract scale — you can find out what named notes are present, essentially, C and F# and A, but you don’t know what octaves those are in. Which means that knowing whether the current note is “below” or “above” another detected note in a different segment is impossible at a glance, which can lead to situations where instead of ideally pitching G4 (i.e. a G note in the fourth octave) up two semitones to A4 and B4 down to semitones to A4 (with minimal timbral whackery), the script can end up shifting G4 down ten semitones to A3 (hello, Satan!) or B4 up ten semitones to A5 (Alvin, Theodore, et al).

There may be good ways to deal with that stuff more elegantly. I haven’t tried yet.

I also wonder if I can use the segment timbre information that Echo Nest provides to make some sort of educated guesses about what kind of sound or sounds are responsible for generating the current dominating pitch. It might be possible to at least partially differentiate between e.g. round bass notes vs. sharper vocal or guitar notes and do some intentional targeting of one part of the recorded mix vs. others for what the script pays attention to.

Another possible approach to this whole idea would be to look not just at the single loudest pitch for every segment but at all the pitches present, and try to identify chords based on the three or four most present pitches in the segment. If you’ve got proportionally a lot of C, E, and G standing out in a segment, it’s a good guess that you’re dealing with a C major chord, whatever else might be going on. Combining this sort of chord detection with the glitch-smoothing idea above might make it possible to analytically ignore parts of the mix and really just focus on the chords underlying a song (to pull off a stronger type-1 It’s All One Chord effect) or to attend to the vocal line when it seems to stray from the chord roots (to better achieve a type-2 The Singer Only Knows One Note thing).

That last idea might also make for a way to build a rough chord-analyzer system, which would be a boon to anybody who needed the chords for an mp3 and couldn’t find nice simple tabs for it anywhere.

Anyway, I’ve had a lot of fun with this so far. If you have specific ideas for how to approve or differently approach this idea, or have a song in mind that you think might producing interesting monotonization output, let me know.

23 Comments

  1. Chris Mear wrote:

    This is atrocious, and I mean that as the sincerest compliment.

    Monday, August 9, 2010 at 1:06 pm | Permalink
  2. Steve wrote:

    I think you could do a bunch of things to make the pitch detection focus on vocals and not so much on the music. First, for each of your small segments, you should bandpass it in the range of something like 400Hz-1500Hz and forget the rest (high-pitched musical notes and the bassline).

    Additionally, if your segments are short enough, you should probably just do some time-weighted average over N segments, instead of doing each one individually. That would lead to less oscillation during what we mostly observe as “constant” notes.

    Also, since you have a stereo source (right?) you could try to isolate the vocals from the music by differencing the channels (phase inverted), then subtracting that result back out again, trying to isolate an in-phase “center channel” which is where your vocals are likely going to be concentrated. Use this isolated channel (bandpass filtered, as above) for your tone analysis, and then pitch-shift the original source.

    Monday, August 9, 2010 at 1:29 pm | Permalink
  3. Dan Pawlak wrote:

    Finally, an entertaining version of “Mmm, Mmm, Mmm, Mmm”! Nice work here.

    Tuesday, August 10, 2010 at 10:11 am | Permalink
  4. shay wrote:

    what is the point of all this

    Tuesday, August 10, 2010 at 12:35 pm | Permalink
  5. Josh Millard wrote:

    Only what it says on the tin, shay. If I had to have a point every time I went to do something weird to music, I’d get a lot less done.

    Wednesday, August 11, 2010 at 7:16 am | Permalink
  6. WiL wrote:

    If you don’t know what the point of this is, you don’t know what the point of living is, it is done because it was thought of and tried, just to see, end of. I for one love it, ‘Because I Got High’ is actually an improvement IMO!

    Friday, August 13, 2010 at 9:44 am | Permalink
  7. mofaha wrote:

    Fantastic stuff. Thanks for sharing.

    Friday, August 13, 2010 at 10:45 am | Permalink
  8. Jim Keith wrote:

    Interesting stuff! I didn’t listen to all the examples, but, being a Radiohead fan, quickly found my way to the Boringized version of Karma Police, where I noticed another interesting facet of your script that may not have been encountered or discussed in the other examples. Since Karma Police has a wonderful key change about halfway through, the script has managed to compute an “average tonic” for the entire song that is neither the tonic of the first or second halves. I though that was pretty cool.

    Irrelevant side note: my old band had a song called Desperately Seeking Losing that you just reminded me of as well.

    My first visit to your site, and I will be checking it all out. Cheers!

    Friday, August 13, 2010 at 11:20 am | Permalink
  9. Delia wrote:

    The explanation did the same thing to my brain that the music did to my ears…

    Friday, August 13, 2010 at 11:23 am | Permalink
  10. Louis 14 wrote:

    I do find this a little pointless – so a) we know that music that has variation taken out of it is usually boring, and b) that crude digital manipulation of music can sound freaky.

    What would be more interesting to me is why, considering point a), a couple of decades-worth of disco/club tunes based around one note, or chord, or endlessly repeating riff, or even no discernible harmonic content at all, were/are considered listenable.

    Friday, August 13, 2010 at 12:03 pm | Permalink
  11. Josh Millard wrote:

    Heh. Greetings, citizens of b3tastan!

    Friday, August 13, 2010 at 12:38 pm | Permalink
  12. me wrote:

    do ‘numb’ by u2! oh wait…

    Friday, August 13, 2010 at 5:48 pm | Permalink
  13. Gene Savage wrote:

    I, for one, am completely fascinated by this! It takes songs we’re familiar with and takes them in new directions… even if those directions may require a little Dramamine. ;-)

    It makes me more aware of where the voice is going in relation to the bass / chords, it makes me more aware of the “texture” of both the recording and the singer’s voice, and at least once on this demo I understood the lyrics better… there’s a shock for ‘ya! :)

    With enough polish, this certainly could be an acoustic art tool. Imagine taking it and forcing it to not just mono-tize a song, but give it “if note equals X, make it Y,” in such a way that an A B C D E F G scale would be played back as B F A E G D C!

    You could write entirely new “songs” from original recordings… the mind boggles.

    Keep experimenting! Keep playing! And ESPECIALLY keep sharing your results, successes and failures.

    Thank you for improving my night!!!

    Friday, August 13, 2010 at 8:54 pm | Permalink
  14. Nick P wrote:

    I love this! The chaos of it is intensely appealing to me. Any way technology can mess with music and produce a different way of listening to it, even an altered way of appreciating melody, is interesting to me. I think ‘Karma Police’ in particular was a fascinating listen.

    Saturday, August 14, 2010 at 5:04 am | Permalink
  15. Fox wrote:

    Well done on making Radiohead sound even drearier!

    Saturday, August 14, 2010 at 7:12 am | Permalink
  16. Dinsdale wrote:

    I thought I was a fan of Pink Floyd and the Shangri Las, but this has made them even more interesting. Thanks!

    Sunday, August 15, 2010 at 10:29 am | Permalink
  17. David Baxter-James wrote:

    Love it. Can you produce results the other way round? Take some crappy dance/techno track and actually make it musical?

    Monday, August 16, 2010 at 7:38 am | Permalink
  18. Jeff wrote:

    A chord analyser that could be run across mp3 files would indeed be brilliant. Please put me down for one as soon as you’ve built it!

    Wednesday, August 18, 2010 at 1:16 am | Permalink
  19. peter wrote:

    I wonder if you could do something to preserve the spectrum, or keep it similar, while pitch shifting. In that first accapella I still heard it as a melody just because of the timbre changes. It’s really all the overtones, the natural filtering of the singer’s mouth, etc, that you hear getting shifted, so I think the challenge would be to make those sound constant while changing the base frequency.

    Friday, August 20, 2010 at 9:44 pm | Permalink
  20. K wrote:

    You could try this program. It seems to keep it more realistic: http://hypermammut.sourceforge.net/paulstretch/

    Example:
    http://soundcloud.com/shamantis/j-biebz-u-smile-800-slower

    Saturday, August 21, 2010 at 8:17 am | Permalink
  21. Josh Millard wrote:

    The idea of using this sort of technique to not simply monotonize but rather change the harmonic profile of a song is a fun one, I’d been chewing on it already but haven’t gotten as far as figuring out how it would work. I’m not sure how easy it would be to do in a general case with this particular approach, since this is a pretty naive and scattershot tool as is.

    But if something like my chord-analyzer idea can be made to work reasonably well, then it’d make sense to go from there to giving a “rechorder” script an input song and a chord chart and have it shift the content of the source song to match the desired chord pattern. Would probably sound insane even if it was more or less harmonically correct, but it’s a fun idea.

    And K, PaulStretch is indeed a neat piece of software (I posted about it recently) but it’s not really a pitchshifting tool in this sense.

    Saturday, August 21, 2010 at 8:36 am | Permalink
  22. cobby wrote:

    Please, please make these individually linkable, link to mp3′s, or put them on youtube. I’d very much like to make playlists and share these with my friends. Bravo!!!!!

    Friday, December 31, 2010 at 5:58 pm | Permalink
  23. Forrest wrote:

    yeah i’m a friend of cobby’s and I wholeheartedly agree. variation is good, some of it moreso, like this.

    Thursday, June 2, 2011 at 1:02 pm | Permalink

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*