Recommendation Engine with R

There is a class of web applications which are required to make a recommendation to users based on either the past preferences of the user or preferences of similar users. This is achieved through a technique known as recommendation engine

Put simply, recommendation engines make product/service recommendations to people. These systems are at work behind the scenes in almost every popular e commerce and social networking site today such as Amazon, Linked-in, Netflix and Facebook.

Recommendation engines have become extremely important because of the innumerable variety of products carried by today’s ecommerce portals. Whereas, in a traditional brick and mortar shop, given the limited number of products on display, it is relatively easier for a buyer to browse through the products displayed on the shelves than in an online store where the huge number of products makes it impossible for a buyer to chance upon something she likes.

Recommendation engine addresses this challenge by narrowing down the choices for the buyers by presenting them with a limited number of items that they are most likely to choose from.

Recommendation systems can be based on a number of different technologies. Broadly, these can be summarized into two groups:

1)Content Based Systems

These Systems examine the properties of items and make the recommendation based on it. eg. if an online bookshop has processed a number of science fiction books, the system will look into the database and locate books tagged ‘science-fiction’ and recommend it to the buyer.

2)Collaborative filtering Systems

These systems recommend items based on similarity measure between users and items. The recommended items are based either on similar items purchased by the user or items purchased by similar users.

Let’s try to build a recommendation engine using R

For this example we will be using the recommenderlab package, and the Collaborative filtering approach.

A collaborative filter algorithm can be used to find a prediction or a recommendation. A prediction is the likelihood( A numerical probability P) that a user will like an item, while a recommendation is a list of items that the user will prefer the most. Note that the list of items must not already be purchased by the user

Now we can use either item based collaborative filtering or user based collaborative filtering. With user based collaborative filtering, previous users’ opinion will be taken into account and a recommendation made for the new user. However, note that a problem with this approach is users may have rated only a few items( Not enough data is available) and moreover not much information is available about the new user.

Lets use User Based Collaborative Filtering[UBCF] to create a recommendation engine to recommend movies to a specific user.

First load the recommenderlab package using library(recommenderlab)

The data we’re using is the MovieLense data. Use data(‘MovieLense’) to load the data.

Before we begin, examine the structure using str(MovieLense)

You can even visualize the data by typing

image(MovieLense[1:25,1:25])

Its already noticeable that a lot of items have not been rated

Now to check the number of movies rated by user type summary(rowCounts(MovieLense)).

Since we need to check how the movies have been rated, lets store the MovieLense data in vector. vector_ratings<-as.vector(MovieLense@data)

The number of unique ratings can be displayed by typing unique(vector_ratings)

Now check the count for each rating value

You can visualize this by typing barplot(table_ratings)

We can see that a majority of the ratings are 0. Which means that a lot of items are unrated.

The data needs to be cleaned before using it.

Lets now remove all the unrated items from the vector
vector_ratings2<-vector_ratings[vector_ratings!=0].

We can now see the ratings after the unrated items have been removed

Now spilt the data in order to create a training model

evlS <- evaluationScheme(MovieLense, method=”split”, train=0.9,
given=12)

trg <- getData(evlS, “train”)
test_known <- getData(evlS, “known”)
test_unknown <- getData(evlS, “unknown”)

The evaluationSceme takes in the data set as a rating matrix[here it is the MovieLense data]. We split the data while specifying train as 0.9 which is the proportion of records used to build the training model and the rest can be used to test the model. The “given” parameter specifies how many ratings should be randomly selected for a user. The recommended items are then compared to the rest of the items for that user.

The function getData is used to access the data. The parameter “train” will return the data used for training, while the parameter “known” shows the known ratings we used in prediction for the test data, and “unknown” returns the data we use for evaluating the test data

You can now create a recommender model with UBCF on the training data

rcmnd_ub <- Recommender(trg, “UBCF”)

With this model we can now create recommendations for a given set of test users

pred_ub <- predict(rcmnd_ub, test_known, type=”ratings”); pred_ub

You can now see the top few movie predictions:

Finally, check the accuracy of your model, by entering

acc_ub <- calcPredictionAccuracy(pred_ub, test_unknown)
as(acc_ub,”matrix”)

RMSE is the Root Mean Squared error. It can be calculated as

Here the average of the squares difference between the estimated value and true is calculated and its square root is taken. An ideal RMSE value would be zero, but this is usually not practical. We can see that the RMSE depends on the target values [entered in the calcPredictionAccuracy] function, so an RMSE of 1.03 can be considered OK in our test case.

A recommendation engine is a powerful tool that can help your users find something that they could potentially require but are unable to know how to search for. I hope this post helps you get started with applying this tool for your own projects.

Design a site like this with WordPress.com
Get started