abayesed
abayesed

Reputation: 11

How to transform a list of user ratings into a matrix in R

I am working on a collaborative filtering problem, and I am having problems reshaping my raw data into a user-rating matrix. I am given a rating database with columns 'movie', 'user' and 'rating'. From this database, I would like to obtain a matrix of size #users x #movies, where each row indicates a user's ratings.

Here is a minimal working example:

# given this:
ratingDB <- data.frame(rbind(c(1,1,1),c(1,2,NA),c(1,3,0), c(2,1,1), c(2,2,1), c(2,3,0), 
                         c(3,1,NA), c(3,2,NA), c(3,3,1)))
names(ratingDB) <- c('user', 'movie', 'liked')

#how do I get this?
userRating <- matrix(data = rbind(c(1,NA,0), c(1,1,0), c(NA,NA,1)), nrow=3)

I can solve the problem using two for loops, but this of course doesn't scale well. Can anybody help with me with a vectorized solution?

Upvotes: 1

Views: 2954

Answers (1)

Sven Hohenstein
Sven Hohenstein

Reputation: 81713

This can be done without any loop. It works with the function matrix:

# sort the 'liked' values (this is not neccessary for the example data)
vec <- with(ratingDB, liked[order(user, movie)])

# create a matrix
matrix(vec, nrow = length(unique(ratingDB$user)), byrow = TRUE)

     [,1] [,2] [,3]
[1,]    1   NA    0
[2,]    1    1    0
[3,]   NA   NA    1

This will transform the vector stored in ratingDB$liked to a matrix. The argument byrow = TRUE allows arranging the data in rows (the default is by columns).


Update: What to do if the NA cases are not in the data frame? (see comment by @steffen)

First, remove the rows containing NA:

subDB <- ratingDB[complete.cases(ratingDB), ]

  user movie liked
1    1     1     1
3    1     3     0
4    2     1     1
5    2     2     1
6    2     3     0
9    3     3     1

The full data frame can be reconstructed. The function expand.grid is used to generate all combinations of user and movie:

full <- setNames(with(subDB, expand.grid(sort(unique(user)), sort(unique(movie)))),
                 c("user", "movie"))

  movie user
1     1    1
2     2    1
3     3    1
4     1    2
5     2    2
6     3    2
7     1    3
8     2    3
9     3    3

Now, the information of the sub data frame subDB and the full combination data frame full can be combined with the merge function:

ratingDB_2 <- merge(full, subDB, all = TRUE)

  user movie liked
1    1     1     1
2    1     2    NA
3    1     3     0
4    2     1     1
5    2     2     1
6    2     3     0
7    3     1    NA
8    3     2    NA
9    3     3     1

The result is identical with the original matrix. Hence, the same procedure can be applied to transform it to a matrix of liked values:

matrix(ratingDB_2$liked, nrow = length(unique(ratingDB_2$user)), byrow = TRUE)

     [,1] [,2] [,3]
[1,]    1   NA    0
[2,]    1    1    0
[3,]   NA   NA    1

Upvotes: 3

Related Questions