The AI Behind Music

Esha Singaraju
10 min readJan 28, 2021

--

Photo courtesy of Franck V. on Unsplash

Spotify, with over 4 billion playlists, and countless features dedicated to personalization, has taken over the music streaming industry (286 million users!!). Softwares like GarageBand & Avid Pro allow artists to make music without having even a single physical instrument. The world of music has slowly become more digital, a journey from vinyl records to MP3 players & Apple Music. And the world’s latest breakthrough technology, Artificial Intelligence, is also paving its way into the industry. AI is allowing artists to lift no more than a finger to produce award-winning music. Allowing avid music listeners to find more artists and songs suited to them.

Allowing the music industry to evolve into an era of personalization and expression.

Wait, But What’s AI?

Artificial Intelligence, as its name might imply, is the concept of making machines more intelligent. Its purpose is to give machines tasks usually given to humans, as a way to make our lives easier. AI mimics our learning pattern to perform human tasks at a better and more accurate rate.

Machine Learning

Machine Learning is the idea of training a machine to read/understand data, and then apply its learnings to make new decisions. Think of an octopus as a machine, and you tell the octopus what qualifies as a fish and what doesn’t. You show it salmon, and tell the octopus that it’s a fish. It processes that information. You show it a shoe and tell it that the shoe is not a fish, and it processes that information as well. Then later, when it is shown a shoe, the octopus knows that the shoe does not qualify as a fish.

The types of Machine Learning are two-fold: supervised learning and unsupervised learning.

Supervised learning is like the octopus example shown earlier. You show the octopus a shoe, and tell the octopus that it is a shoe and not a fish. This is known as labeled data because we are… labeling the data and calling a shoe a shoe and not a fish. Using a Supervised Machine Learning Algorithm, it can predict what is a shoe and what is a fish (when given only the options of shoe and fish, of course).

Unsupervised learning, on the other hand, is giving the octopus a shoe and letting it create its own label for that data. For example, we give the octopus a goldfish, and the octopus labels the goldfish as orange. It now knows to differentiate between an orange goldfish and everything else that is not an orange goldfish. Unsupervised learning uses a process called data mining, discovering unique patterns/categories within the data given.

Neural Networks

Courtesy of Glosser.ca

Neural Networks are an important part of AI. A neuron is a single learning unit. So, as imagined, a neural network is a combined connection of many different neurons. All of their input and outputs are intertwined, and it’s modeled after the existing network of neurons in our brain.

Recurrent Neural Networks (RNN) are a type of neural network. It is the first AI algorithm that is able to remember its input and use it in its output. RNNs are perfect in problems requiring sequential data, such as the order of notes in a piano piece.

Convolutional Neural Networks (CNN) are another type of neural network. It’s very similar to a basic neural network, in that it analyzes an image’s raw pixel data to detect people, places, animals, things, etc.

So, if there’s anything you’ve understood from the last few paragraphs, it’s that AI & ML detect patterns — whether it be data we spoon-feed to them or data we throw at them (like throwing someone that can’t swim in the water and letting them figure it out). And patterns are extremely valuable in music. The chord progressions we use in pop songs are patterns, the rhythm/beat in songs are patterns, even our basic major & minor scales follow a simple pattern.

AI For Listeners:

Music streaming is constantly evolving. In the past, we were able to capture soundwaves onto vinyl. Listeners listened to a small number of artists, and their artist taste reflected their personality. You only listened to The Beatles or The Rolling Stones, not both. As time passed by, music streaming evolved to DVDs, allowing listeners to gather a larger collection of artists, genres, songs, and culture. In today’s world, we are able to listen to over 50 million songs at the tap of a button, through streaming services (like Spotify, Apple Music, and Pandora). These streaming services have advanced to provide you with personalized songs/artists/genres/playlists to enhance the listening experience. And it’s crazy accurate.

Spotify’s “Discover Weekly,” courtesy of Spotify

Take Spotify’s “Discover Weekly” for example. As a religious Spotify user, I look at this feature almost every week to find new artists and songs, and I have it to thank for my ~impeccable~ music taste. Discover Weekly works by using a combination of three different AI models: Collaborative Filtering, Audio Models, and Nature Language Processing.

Collaborative Filtering

This model basically involves comparing your listening trends on Spotify to other users. Rather than having a specific feedback/rating section, Spotify tracks value through other metrics like how many times a user replays a song, if it was added to a playlist, how many songs by a particular artist were listened to, etc. The model will compare your listening to someone with similar patterns, and recommend to the other the gaps. For this, the data contains a set of items and a set of users who have reacted to the items. Their reaction can be either explicit (favoriting a song on Spotify) or implicit (adding the song to a playlist, viewing the artist’s profile, replaying the song). This data is best represented through a matrix. The example below is a matrix representing 5 users and 5 items, with each user rating at least 2 of the items. For example, the third user has given the fifth item a rating of 1.

For example, let’s say that there are 5 users. They’re each given five artists (Billy Joel, Taylor Swift, Pink Floyd, Harry Styles, & Elton John) and told to rate them on a scale from 1–5 if they’ve listened to them before. Looking at the matrix, we can see that User 1, User 2, and User 3 all like Pink Floyd. User 1 & User 2 also like Billy Joel, but User 3 has never listened to them. The algorithm notices this, and will then recommend Billy Joel to User 3 since they share similar taste with Users 1 & 2. This is similar to how Spotify uses CF to curate recommendations, but instead of having users rate their listening experience, it uses implicit data as mentioned to the left.

Nature Language Processing (NLP)

NLPs are the brains behind Alexa and Google Home. Simply put, it’s how AI analyzes human speech via text. Spotify uses AI and NLPs to track data from other platforms on the internet to learn more about an artist/song. The NLP filters through thousands of websites, blogs, social media posts, etc. to find the common phrase or word used within them. These common phrases/words are then separated into categories relating to the artist or common word, and each word/phrase carries a weight to demonstrate its importance. This part of the process is important to understand what labels listeners put on songs, to better categorize them, and recommend them to users. For example, if a set of songs that are not remotely close to each other are labeled as “Tik-Tok” songs, and the user has a tendency to listen to this categorization of songs, then the NLP model knows to recommend more songs to the user from the same group. And since these songs are not close to each other in the genre, using the NLP to filter metadata provides a new avenue in the personalized music field.

Audio Models

Audio models are used to detect what the NLP might not. It uses data to categorize songs more accurately, helping Spotify to categorize songs/artists regardless of their fame online. By listening solely to the audio from a song, the audio model can detect the genre the song resides in and the CF model can analyze the track to recommend it to similar users (this is how we find our “underground” artists!). This is possible through something is known as Convolutional Neural Networks (CNNs), famed for their usage in facial recognition. And while the iPhone might use image & pixel data to detect a face, Spotify uses audio data to detect a genre in a song. By sending the song through a CNN, the core features of a song (mentioned earlier) will be singled out, put into Spotify’s databases, and then filtered through to identify characteristics and similar ones for grouping.

For Artists:

There are two parts to a song: the lyrics and the music. In the early days of music composition, many things went into consideration: a musicians’ muse, their fans, their style, their genre, among so many others. A single artist, probably accompanied by other songwriters & musicians, had to take all of these factors into play while writing these songs. The creative process was a lengthy process — taking anywhere from 4 hours to 4 months.

Taylor Swift on writing a song, photograph courtesy of Disney

Now, AI makes the creative process for artists a smooth process. Using big data collected from streaming services like Apple Music and Spotify, artists are given information regarding their listener’s demographics head-on. The data is handed to them on a silver platter, making catering songs to an audience easier.

In the future, AI might even be able to create its own music, using something known as GANs (Generative Adversarial Network). GANs are a type of neural network structure that processes data, famously known for generating realistic photographs of human faces and generating cartoon characters. It works by pitting two neural network models next to each other: the discriminator and the generator. The generator is responsible for creating images that look similar to the dataset, and the discriminator is responsible for detecting whether the images created are real or fake.

The production process of the instrumental part of a song can be replaced with GANs. Using data like the speed, rhythm, instruments, and key, the model can analyze these to cater to an artist’s preferences.

Google’s Magenta is a new research project seeking to use ML to aid an artist’s creative process. Working hand in hand with actual artists, they’re using RNNs and neural networks to fulfill this goal. In 2019 Magenta worked with YACHT, an electro-pop group, to implement ML in their upcoming album. Here’s how it worked:

YACHT, courtesy of teamyacht.com
  1. YACHT took all 82 of their songs and separated each musical component (drum rhythm, bass lines, vocals, guitar chords, etc.)
  2. Each of these parts was broken into four-bar loops.
  3. The loops were then put into an ML model that used MelodyRNN, PerformanceRNN, and SketchRNN (all parts of Magenta’s ML model).
  4. The model shot out new melodies based on their old ones.

The same process could be used for creating lyrics, putting their old songs & other inspiring pieces to produce new lyrics. Then, a combination of the new lyrics and melodies can be put together to produce a new song!

Piano Genie is another project built by Magenta. It’s an “intelligent controller that maps 8-button input to a full 88-key piano in real-time,” or in layman’s terms, a way to play a full-sized piano using only 8 buttons. Using the same data from the PerformanceRNN, the Genie uses a time-varying mapping, in which each decision relies on its previous factors (key, time, measure, etc.). As you start clicking one of the eight buttons on the Piano Genie, the machine filters through the PerformanceRNN to find a pattern in the dataset similar to the one being currently played. From there it will play the note that will best satisfy the melody, and the whole cycle repeats!

Watch the Piano Genie in action! (Courtesy of Google Magenta)

What’s Next For Music in AI?

Artificial Intelligence in music can potentially be revolutionary in the industry, but (as always with the topic of AI) it possesses some ethical concerns. The concern arises especially in the creative process of music: For artists, writing a song is a raw and emotional process, as it’s a way to cope with heartbreak, trauma, loss, and just about anything.

Whether it would be ethical or not for AI to create songs is a question for the future. And who knows, maybe one day in the future it’d be possible for AI to create personalized music for every single individual, knocking out the need for artists altogether! Either way, AI is slowly paving its way into the music industry, curating and making music for the listener.

--

--