movie visualization

based on user ratings

Above is a visualization of the 2000 most reviewed movies on MoveLens. I am planning on adding way more, but I need to make this way more efficient first. Unfortunately, the dataset I found only contains movies made before 2018, and most made after 2010 don't have many reviews, so they won't be here either. Being zoomed all the way out makes it pretty slow, so I recommend zooming in before you try to move around.

Generally, the movies are grouped by genre (early 2000s comedy, 90s rom-com, etc.), but there are some outliers. For example, on the right side there is a distinct group of movies that seem to not share any genre or time period. Instead, these seem to be either award winning or just plain "good" films.

How it works

This visualization is based on user ratings from the website MovieLens. You can find the dataset here. The idea is that if a given user likes two movies together more than they would like each one individually, the two movies are similar. Then, we can use a Siamese Network to create vector embeddings of each movie and convert them to 2D using t-SNE. A Siamese Network is a special kind of neural network where two inputs are passed in at the same time and the loss function is computed based on the distance between the two movies.

If two movies are not similar, they should be moved apart, and if they are similar, they should be moved closer together. This means that movies are not grouped by topic but by user preference. Of course, many people really enjoy certain genres, which is why you see small pockets of them. This is why movies like Before Sunrise or Before Sunset are not near other romance movies but are still close.

Initially, I implemented a typical contrastive loss function, but this gave me mediocre results. Instead, I used Triplet Loss, which actually passes in three inputs to the model: the movie we want to compute, a similar movie, and a different movie.

This project was totally inspired by Nathan Rooy, who had initially done the same thing for books. It's really cool, you should definitely check it out.

Here is the citation to the dataset's original paper:

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872