Anime-RecSys

Goal of this repo is to document the development of user-item recommendation systems, and compare different methods. Dataset used is Anime Recommendation Database 2020 from Kaggle. All the models can be downloaded from the Drive.

There are two most common ways of making recommending:

Colaborative filtering – methods of making predictions about the interests of a user by collecting preferences or taste information from many users.
Content-based filtering – making predictions based on a description of the item and a profile of the user’s preferences.

Right now we will focus only on colaborative filtering.

Predicting User-Anime Ratings

One of the ways to recommend items is by predicting what rating will a user put on an item, and showing the ones that user hasn’t seen and have the highest rating.

Matrix Factorization

Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices. For example, using the anime rating matrix we can try to learn the user and anime matrices, such that each row and column represent a user and an anime respectively.

We are never actually going to create this rating matrix, however it is better to imagine it like that.

Making the tf.keras.Model

Instead of user and item matrices, we will use embeddings, which will map each user and anime to a vector. In addition, we'll add a bias to each user and anime. ```python class MatrixFactorizationModel(tf.keras.Model): def __init__(self, num_users, num_items, embedding_dim): super(MatrixFactorizationModel, self).__init__() self.embedding_dim = embedding_dim self.user_embeddings = tf.keras.layers.Embedding(num_users, embedding_dim) self.item_embeddings = tf.keras.layers.Embedding(num_items, embedding_dim) self.user_biases = tf.keras.layers.Embedding(num_users, 1) self.item_biases = tf.keras.layers.Embedding(num_items, 1) self.bias = tf.Variable(tf.zeros([1])) self.dropout = tf.keras.layers.Dropout(.5) def call(self, inputs, training = False): ... ``` It is a good practice to place a dropout layer over embedding layers to prevent overfitting and in turn, making features more robust. To compute the predicted rating I use the formula `prediction = (user_embedding + user_bias) * (item_embedding + item_bias) + bias`. ```python class MatrixFactorizationModel(tf.keras.Model): def __init__(self, num_users, num_items, embedding_dim): ... def call(self, inputs, training = False): user_ids = inputs[:, 0] item_ids = inputs[:, 1] user_embedding = self.user_embeddings(user_ids) + self.user_biases(user_ids) item_embedding = self.item_embeddings(item_ids) + self.item_biases(item_ids) if training: user_embedding = self.dropout(user_embedding, training = training) item_embedding = self.dropout(item_embedding, training = training) user_embedding = tf.reshape(user_embedding, [-1, self.embedding_dim]) item_embedding = tf.reshape(item_embedding, [-1, self.embedding_dim]) dot = tf.keras.layers.Dot(axes=1)([user_embedding, item_embedding]) + self.bias return dot ``` To compile the model I used Adam optimizer, MSE as the loss and RMSE as a metric to keep track of. The model is ran [eagerly](https://www.tensorflow.org/guide/eager), to allow such model definition. ```python mf_model = MatrixFactorizationModel(num_users = num_users, num_items = num_anime, embedding_dim = 64) mf_model.compile( optimizer = tf.keras.optimizers.Adam(), loss = tf.keras.losses.MeanSquaredError(), metrics = [ tf.keras.metrics.RootMeanSquaredError("RMSE") ], run_eagerly = True ) ```

Neural Network

The neural network will try to learn user and item embeddings that model user-item interactions.

Model sctructure is quite similar to that of MatrixFactorization, however now we will concatenate user and anime embeddings and pass them through two dense layers. Last layer has only one node and its output will represent the predicted rating.

Making the tf.keras.Model

Let's define the user and item embeddings and the two dense layers. I used `relu` activation on the output layer because predictions non-negative numbers. ```python class NeuralNetworkModel(tf.keras.Model): def __init__(self, num_users, num_items, embedding_dim): super(NeuralNetworkModel, self).__init__() self.embedding_dim = embedding_dim self.user_embeddings = tf.keras.layers.Embedding(num_users, embedding_dim) self.item_embeddings = tf.keras.layers.Embedding(num_items, embedding_dim) self.dense1 = tf.keras.layers.Dense(64, activation='relu') self.dense2 = tf.keras.layers.Dense(1, activation='relu') self.concat = tf.keras.layers.Concatenate() self.dropout = tf.keras.layers.Dropout(.5) def call(self, inputs, training = False): ... ``` We'll take user and item embeddings, apply dropout after each, and concatenate them. Then we'll pass them through two dense layers. ```python class NeuralNetworkModel(tf.keras.Model): def __init__(self, num_users, num_items, embedding_dim): ... def call(self, inputs, training = False): user_ids = inputs[:, 0] item_ids = inputs[:, 1] user_embedding = self.user_embeddings(user_ids) item_embedding = self.item_embeddings(item_ids) if training: user_embedding = self.dropout(user_embedding, training = training) item_embedding = self.dropout(item_embedding, training = training) user_embedding = tf.reshape(user_embedding, [-1, self.embedding_dim]) item_embedding = tf.reshape(item_embedding, [-1, self.embedding_dim]) x = self.concat([user_embedding, item_embedding]) x = self.dense1(x) x = self.dense2(x) return x ``` Similarly to `MatrixFactorization`, to compile the model I used Adam optimizer, MSE as the loss and RMSE as a metric to keep track of. The model is ran [eagerly](https://www.tensorflow.org/guide/eager), to allow such model definition. ```python nn_model = NeuralNetworkModel(num_users = num_users, num_items = num_anime, embedding_dim = 64) nn_model.compile( optimizer = tf.keras.optimizers.Adam(), loss = tf.keras.losses.MeanSquaredError(), metrics = [ tf.keras.metrics.RootMeanSquaredError("RMSE") ], run_eagerly = True ) ```

Predicting User-Anime Interactions

Being able to predict what rating will a user place on an anime is nice. However, our goal is to recommend anime. So, basically, we can rephrase our question to will a user have a positive interaction with an anime?

Let’s define positive and negative interactions as

` 1 `: user liked the anime, watched it all, and placed a good rating;
` 0 `: user dropped the anime, placed a low rating.

Now, we can use exactly the same model structures, except with a sigmoid function applied to the output.

Dataset

In our anime dataset we don’t have a positive-negative interaction column, so we will make it ourselves.

We’ll leave only Completed and Dropped columns. All the anime rated less than or equal 5 or that were dropped, will be considered negative interactions. Rest are positive.

watching_status values:

Currently Watching
Completed
On Hold
Dropped
Plan to Watch

interactions = interactions[interactions["watching_status"] <= 4]  # leave only
interactions = interactions[interactions["watching_status"] >= 2]  # watching_status that
interactions = interactions[interactions["watching_status"] != 3]  # are 2 or 4

def transform(df):
    df.loc[df["rating"] <= 5, ["watching_status"]] = 0.0
    df.loc[df["watching_status"] == 2, ["watching_status"]] = 1.0
    df.loc[df["watching_status"] == 4, ["watching_status"]] = 0.0

    df.rename(columns = {'watching_status': 'interaction'}, inplace = True)

transform(interactions)

good = interactions['user_id'].value_counts().loc[lambda x : x > 4].unique()  # we will consider a user if they  
interactions = interactions[interactions["user_id"].isin(good)]               # have more than 4 occurances

Hybrid (Predicting User-Anime Ratings and Interactions)

(IMHO) A model that has multiple tasks can perform better because of more diverse data. We can make a predictor that will achive both tasks, predict user-item ratings and interactions.

As it has two tasks, model loss function will be the sum of two loss functions for each task multiplied by importance weights.

$J = \lambda_1 J_{ _\text{Rating}}+ \lambda_2 J_{ _\text{Interaction}}$

Model

Making the tf.keras.Model

Model structure is similar to Neural Networs used for rating and interaction predictions, however now it has both outputs (`dense2` and `dense3` layers). ```python class HybridNeuralNetworkModel(tf.keras.Model): def __init__(self, num_users, num_items, embedding_dim): super(HybridNeuralNetworkModel, self).__init__() self.embedding_dim = embedding_dim self.user_embeddings = tf.keras.layers.Embedding(num_users, embedding_dim) self.item_embeddings = tf.keras.layers.Embedding(num_items, embedding_dim) self.dense1 = tf.keras.layers.Dense(64, activation='relu') self.dense2 = tf.keras.layers.Dense(1, activation='relu') self.dense3 = tf.keras.layers.Dense(1, activation='sigmoid') self.concat = tf.keras.layers.Concatenate() self.dropout = tf.keras.layers.Dropout(.5) def call(self, inputs, training = False): ... ``` In output scence it is also similar, except now it returns two outputs. ```python class HybridNeuralNetworkModel(tf.keras.Model): def __init__(self, num_users, num_items, embedding_dim): ... def call(self, inputs, training = False): user_ids = inputs[:, 0] item_ids = inputs[:, 1] user_embedding = self.user_embeddings(user_ids) item_embedding = self.item_embeddings(item_ids) if training: user_embedding = self.dropout(user_embedding, training = training) item_embedding = self.dropout(item_embedding, training = training) user_embedding = tf.reshape(user_embedding, [-1, self.embedding_dim]) item_embedding = tf.reshape(item_embedding, [-1, self.embedding_dim]) conc = self.concat([user_embedding, item_embedding]) x = self.dense1(conc) x1 = self.dense2(x) x2 = self.dense3(x) return x1, x2 ``` Each output will have its own loss function defined, as well as evaluation metrics. Lambda values in the formula above are configured in the `loss_weights`. ```python hnn_model = HybridNeuralNetworkModel(num_users = num_users, num_items = num_anime, embedding_dim = 64) hnn_model.compile( optimizer = tf.keras.optimizers.Adam(), loss = { "output_1": tf.keras.losses.MeanSquaredError(), "output_2": tf.keras.losses.BinaryCrossentropy() }, loss_weights = { "output_1": 1.0, "output_2": 5.0 }, metrics = { "output_1": [ tf.keras.metrics.RootMeanSquaredError("RMSE") ], "output_2": [ tf.keras.metrics.BinaryAccuracy("Acc"), tf.keras.metrics.Precision(name = "P"), tf.keras.metrics.Recall(name = "R") ] }, run_eagerly = True ) ```

Comparing Model Performances

Predicting Ratings

Model	val_loss	RMSE
Hybrid NN	1.9694	1.4034
Neural Network	1.9897	1.4105
Matrix Factorization	2.8429	1.6861

Predicting Interactions

Model	val_loss	Acc	Precision	Recall
Hybrid NN	0.3079	0.8790	0.8931	0.9759
Neural Network	0.2502	0.8050	0.8907	0.8282
Matrix Factorization	0.4691	0.7678	0.8969	0.7621

This site is open source. Improve this page.