We at Spotify love music. We love data, too. Because our data is about music and how over 100 million real people listen to it, it’s actually quite human. As our own Matt Ogle said of the popular Discover Weekly feature, “Man vs. machine is no longer a useful distinction in terms of how we build stuff. This is an algorithmic recommendation but it’s humans all the way down. People are inspiring what’s happening.”
Data can also turn into a kind of art when visualized — not so much in typically straightforward bar graphs, but by complex, ambitious approaches to depicting datasets that have real-world meaning.
Let’s explore some music data through of 10 pieces of music data art, organized by manager of cultural insights Jomar Perez. Most of these were created by our analysts, who spend an inordinate amount of time with data, and who created these visualizations in their spare time. The resulting artworks are presented here in alphabetical order; you can click or tap the images to expand them for the maximum effect.
To create City Lights, data scientist Ian Anderson and analyst Manish Nag looked at a single day of anonymous, aggregated listening in New York City and created a point on the above geographical map to represent the locations of listeners at 4am, 8:30am, noon, and 6pm ET (clockwise from the top left). As you can see, listening is pretty low at 4am, but it spikes during the morning and evening commuting hours, with a dip around noon.
Familiarity vs. Discovery
Senior analytics manager Edward Lee started by pulling five months of streaming data for a handful of Spotify employees, with their permission, of course. For each listener, he looked at three things: when the track was played, how long is was played, and the artist who released the track. He plotted each session along two axes: time (vertical) and discovery (horizontal). The line moves down as the listener plays music, and moves left if the currently playing song is by the same artist as the previous song; moves right if the song is by an artist that the listener has never played before; and goes somewhere in between if the song is by a different artist that the listener is familiar with already. The color of each line indicates where on the spectrum it ends up; blue means more familiar, and green means more discovery.
Advanced analytics lead Santiago Gil created the above density plot by mapping the number of monthly listeners of each genre’s artists against how much they over-indexed among users who converted to Spotify Premium during a certain timeframe.
Web developer Peter Margaritoff took the average color of the album covers of the top 2,000 songs on Spotify, divided them by genres (these are the rows), and sorted the rows by hue, so you can make out the color composition of the following genres, from top to bottom: All, Alternative Rock, Classical, Country, Folk, Hip Hop, House, Indie Rock, Jazz, Latin, Metal, Pop, Rock.
Product analyst Ye Zhao, creator of The Matrix explains, “Rendering of the adjacency matrix of the top 2880 artist and top 1800 (city, region) tuples in the US. (2880 x 1800) for the retina display screen resolution. The x-axis pixels represent the different artists. and the y-axis pixels represent the different (city, region) tuples. And the intensity of each pixel represents the normalized play counts of the songs by the artist in the month of January.”
Designer Shu Li assigned a different color to each track that topped the Spotify charts during every week from July to December 2015. The colors correlating to the album art and music video for each song. Major Lazer’s “Lean On” (pink) dominated the summer months, although One Direction’s “Drag Me Down” took the fore for a brief period, indicated by the orange stripe. Fall and winter belonged to Justin Bieber’s “Sorry” (grey) and Adele’s “Hello” (black).
Data scientist and researcher Zac Pustejovsky built the above two-dimensional representation of how a music recommendation system might view its listeners. It’s based on the anonymized, aggregated listening habits of a subset of Spotify listeners who played music from 5,000 artists over a two-week period. He ran a matrix factorization (which is a common way to build a recommendation system) and then used a manifold learning algorithm called tSNE to project the system into two dimensions. Finally, he mapped the users’ favorite genres onto the points.
Data scientist Ariel Marcus created the above diagram to find out which Spotify features (search, radio play, playlist listening, and so on) have the largest impact on conversion to becoming one of our over 40 million paying subscribers. The purple dots represent premium subscribers. His visualization revealed that no individual feature is a better predictor of whether someone will subscribe than how long they have been listening to music on Spotify.
Vector Threads in Phase Space
This piece from analytics design lead & front-end developer Dan Delany is rife with complexity, but at its core, it’s about how Spotify users’ day-to-day listening patterns change over time. Each quadrant pairs two metrics (such as seconds played, plays over 30-seconds, number of days the user has been active, and so on), forming 2-D phase spaces representing all possible metric values. Day-to-day changes in each user’s metrics are projected into these 2-D spaces as vectors, which are averaged across users to create simplified uniform vector fields for each space.
As he says, “From this experiment we observe that particle advection makes pretty pictures. They might even be meaningful, if we can figure out how to interpret them.”
There’s also a really impressive interactive version here.
Product analyst Kevin Showkat and user researcher George Murphy portrayed the musical taste of 12 Spotify employees as galaxies. The sun at the center of each galaxy is a picture of the person whose taste is depicted. The glow’s color represents the person’s top genre; it’s size is their account age; and its intensity represents the number of days when the person listened out of the past 90 days. Each star represents an artist that the person streamed once in the given month; its size represents the number of times the artist was streamed that month; and its distance from the viewer reflects the number of unique tracks by the artist streamed that month. Finally, there are rings. Each galaxy gets one ring for every day at least one track was streamed in the month; its angle depends on the time when it was streamed; and its length represents the minutes each track was streamed out of every listening hour.