Home Methodology

Mood Playlist Picker

Ben Kalish, Susie Riley, Yi Zhang

benkalish@u.northwestern.edu susieriley@u.northwestern.edu yizhang@u.northwestern.edu

EECS 352 - Machine Perception of Music and Audio with Bryan Pardo

Northwestern University

Browse the sourcecode on GitHub

Methodology

Our Mood Playlist Picker generates playlists according to the user’s mood. The user quantifies the mood for which they want to generate a playlist according to two scales: valence and energy. Valence defines how positive or negative the feeling of the playlist should be, and energy, or arousal, specifies the energy level of the song. When the user inputs the desired valence and energy, our application returns to them a playlist that matches their specified mood.

In order to determine the valence and energy scores, we trained a model on a preexisting dataset. The dataset we used was a 10,000 song subset of the Million Song Dataset (MSD)(1). Each entry in the MSD has an Echo Nest ID. AcousticBrainz labs(2) provides a mapping from the MSD Echo Nest ID to each track's Spotify ID, if it exists on Spotify. The All Music Guide(3) provides genre labels for each track. Spotipy(4) allowed us to use the Spotify API in python. Using these tools, we filtered for all the songs in the MSD subset labeled Pop/Rock that are available in the US on Spotify. We collected user-rated valence and energy scores for 200 randomly selected songs in the remaining subset by surveying Northwestern students. We used these scores to train our model.

Previous work in music mood classifying on an Arousal-Valence scale has shown that Support Vector Regression (SVR) yields the best results(5). Therefore, we have chosen to train an SVR model for each dimension (Arousal, Valence). Our training set contains 200 A-V labeled songs and the unlabeled set contains 500 songs. For each song, we obtain the following 9 Spotify features: tempo, key, loudness, mode, danceability, speechiness, acousticness, instumentalness and liveness.

In training our model, we based our evaluation on the 5-fold cross validation performance, measured by mean squared error, as well as the training R^2. We optimize the SVR models primarily by making changes to the kernel choice and amount of regularization. By setting the amount of regularization constant, we have found that RBF kernels achieve the highest training R^2, but the Linear kernels achieve a better cross validation performance. Polynomial kernels yield results in between Linear and RBF as expected. As models with RBF kernels tend to fit training data much more tightly than ones with Linear kernels, we believe RBF is overfitting in this case and thereby achieving a lower cross validation performance; 70% to 80% training R^2 on our data which is very subjective and highly noisy seems too high. Therefore, we decided to go with the linear kernel model with empirically optimized regularization coefficient (0.2 for Arousal and 0.5 for Valence). In addition, it is obvious in all models that Arousal is more explained by the features than Valence is. This matches our expectation since Valence is a more subjective concept than Arousal and is hard to be numerically defined.

Linear Kernel, (regularization epsilon = 0.1)
Arousal Training R^2: 0.524
Valence Training R^2: 0.311
Arousal CV MSE: 0.60 (+/- 0.14)
Valence CV MSE: 0.74 (+/- 0.18)

RBF Kernel, (regularization epsilon = 0.1)
Arousal Training R^2: 0.788
Valence Training R^2: 0.705
Arousal CV MSE: 1.08 (+/- 0.32)
Valence CV MSE: 1.08 (+/- 0.39)

Final Model:
Linear Kernel, (regularization epsilon = 0.2 for Arousal and 0.5 for Valence)
Arousal Training R^2: 0.522
Valence Training R^2: 0.313
Arousal CV MSE: 0.59 (+/- 0.14)
Valence CV MSE: 0.71 (+/- 0.12)

Once we determined our model, we applied this model to the remaining songs in the filtered dataset by obtaining the same 9 Spotify features, assigning a valence and energy score to each song. When a user specifies the desired valence and energy of their playlist, our application performs a Euclidean nearest-neighbor search to find 10 songs closest to the valence and energy pair. The user can then add their playlist to their Spotify account and listen at their leisure.

We tested our model numerically as shown above, but we also noted that our playlists made sense qualitatively. A valence of 0 and an energy of 100 gives mostly heavy metal songs with a lot of screaming, while a valence of 100 and energy of 0 gives mostly relaxed, cheerful songs.

(1) https://labrosa.ee.columbia.edu/millionsong/
(2) http://labs.acousticbrainz.org/million-song-dataset-echonest-archive
(3) http://www.ifs.tuwien.ac.at/mir/msd/
(4) https://github.com/plamere/spotipy
(5) http://www.cs.cmu.edu/~rbd/papers/emotion-ismir-09.pdf; https://pdfs.semanticscholar.org/a977/808a4a56d6adf0cd0bb011638ed8d2d97b99.pdf