Ritom Sen
2025
https://github.com/ritomsen/music_recommendation_backend
As we prepare for a future with AI, I think it is important to consider how our lives can be improved from AI based personalization. For me, I love music, and as a Spotify user I am constantly frustrated by how when I go on “smart shuffle” with my saved songs, it still plays bullshit that is j not the vibe, and I have to constantly keep skipping songs. It picks songs that sometimes ruins the mood, and this is pretty annoying. I feel like we can get to a point in technology where our phones should just know what we want to hear next.
This stems from the fact that there is more to what determines a perfect song for the moment then just our listening history and tendencies. Your surroundings, the noises around you, the weather currently, what you are doing, what you did yesterday, how you are feeling, and other context along with your listening tendencies all play factors into what the best song to listen to at the very moment is. Especially as we get new AI devices that are able to perceive and understand more real time data, I think it is interesting to start developing novel ways to figure out how these large models can help determine the best song for us.
My idea was simple, use LLM’s to take in all this context let it go through the search space and decide what the best song is. My system would take in a picture (meant to be wherever the user is currently), their location, audio of their surrounds (didn’t get to implement this before I started working at Google!), and the user’s Spotify account, and then figure out the best song to recommend from there. But I didn’t think it would be effective to create a candidate pool of songs and naively just put all the songs in a prompt and ask an LLM what the best song for the moment was. I thought this was too much context for an LLM to properly understand and make a decent decision on (at least at the time when I was making it). Instead I thought of two different approaches to help the LLM search through this space of candidate songs, and make a fairly optimal decision:

It is important to note though here that a globally optimal recommendation here isn’t necessary. From the user perspective, as long as a song fits the vibe and they don’t want to skip it, I think this is good enough.
I think of it like this, there is 3 categories the song could go into
Elephant in the room: I am skipping “bad rec for the vibes but I won’t skip”, because I think for the use case of this project, a bad rec for the vibes kinda always implies that you are going to skip, so this could never be the case.