Movieterra — Movie Premiere Recommendations with Machine Learning
What interesting movie should i watch at the cinema today? Have you ever had to answer this question at least once? As for us we have. And we were asked more than once. When we watch the trailers of new films, we think that all these movies will be interesting. But often our time and money are wasted because the film is boring, the plot is tightened and absolutely is not the movie that we really wanted to see.
In this article we will say a few words about what we've been working on for the past four months: This is an interactive of recommender system for movie premieres that show in the cinema, Movieterra. The system is based on Machine Learning (ML) and it adapts to the user preferences in real time. As big movies novelty fans we felt the need for such a service, and we believe that it will be useful for every movie lover.
Parallels and concepts
Recommender systems (RS) are programs, which are based on user's data. They aim is selecting goods and services that are most relevant to customers in music, books, websites and in our case in films. Most film services have personalized recommender systems that work on the basis of collaborative filtering: for each user, the recommendations are selected individually, based on his preferences.
Netflix asks user to rate movies and TV shows to determine which film it'll want to see next. And the service by dint of machine learning picks up the covers for movies, which may like the viewer.
Movielens offers the viewer to rate 15 films you've seen before, on the basis of which it creates an individual flavor profile and recommends similar films.
Flixster asks you to rate movies. But this service based on likes also offers extras like film quizzes, the capability to monitor friends' ratings, and more.
The recommender system Movieterra is built on the principle of content-based filtering: recommendations are based on the description of the film and of the user data. The peculiarity is that the viewer visits the site, introduces the titles of their favorite movies in the search line and gets the most relevant premieres that are being shown in the cinema. Registration is not required.
Implementation and algorithms
The content-based recommendations are using description of the film and user's profile. Based on these data, the corresponding vectors are constructed. In the recommendations fall on those films whose vectors are the closest to the user's vector.
Films are described by 5 characteristics: genre, keywords, cast, directors and writers. Vectors are created by genres and keywords. Actors, directors and writers are weighting coefficients.
To compose a vector of user's film genres, the algorithm averages the vectors of genres of all films from him history. There are 18 genre categories in total:
[action, adventure, animation, comedy, crime, documentary, drama, family, fantasy, history, horror, music, mystery, romance, science fiction, thriller, war, western].
Genres give an idea of what is the movie about, but a more specific characteristic is provided by keywords.
Google technology is used to translate keywords into vector space, which is based on neural networks and is designed for statistical processing of large arrays of text Word2Vec. The model was trained on the keywords corpus of the site TMDB, which contains information about 300 000 films and 500 000 keywords to them.
To obtain a vector of the film by keywords, it is necessary to average all the vectors of its keywords.
Determining the vectors of the user's keywords, it is necessary to compute the vectors of keywords of all the movies viewed by him and for each movie calculate the weighting coefficients of the statistical metric TF-IDF in the context of all the user's keywords.
Difficulties and solutions
We set ourselves the task of creating a system which at the entrance takes the viewed movies and at the outputs gives the relevant premieres that are showed in the cinema. But in the development process we faced with such complexities as:
Services don't provide data on visits by users of cinemas
The recommender must work with registered users and new users
Data from the Cinema Platform API is incomplete
The data is unstable. If the film is relevant today, it may not be relevant tomorrow
To solve the problem with the lack of data on visits by cinemas users, we used recommendatory algorithms based on content. They don't require pre-packaged big data sets and solve the problem of lack of initial information about users.
To save and accumulate information about viewers, we decided to save the data initially in csv-files, and then create a MySQL database.
Recommendations based on content, suggest the creation of a model based on the previously known properties of films and the compilation of a user profile based on the films that he has already watched. We used the service TMDB, which provides the API movies to registered users.
To constantly keep actual information about premieres, we update the database and compare new movies with TMDB every day.
Opportunities and Peculiarities
Movieterra is very easy to use: you visit the site, you are introduced with the titles of your favorite movies in the search line and you are got with the most relevant premieres that are showed at the cinema.
We didn't find a recommender system that would provide relevant premieres that are showed at the cinema. We believe that the service, which is developed by us will appeal to viewers and will be of use to them.
We continue improving Movieterra to provide more accurate movie recommendations to users.
Your feedback is very important to us. We would like to hear your comments and suggestions.