Dataminer
Dataminer

Reputation: 1549

Creating a User-Item Matrix for Collaborative Filtering

I am attempting to run a Collaborative Filtering (CF) algorithm on a "User-Item-Rating" data. My data is in a long format i.e. each row has data for a User rating a specific item. I need to convert this into a "User-Item" matrix before I can apply a CF algorithm on it.

I am using the spread function from the tidyr package for this task. But given that I have more than 50k unique items, the resulting dataframe would be huge. R is unable to execute this (on my local machine) and throws up the "cannot allocate vector of size" error.

What's the best way to deal with this? Some of the options I tried exploring, but was unable to get them to work:

Any help will be greatly appreciated.

Thanks!

Upvotes: 0

Views: 1708

Answers (1)

lukeA
lukeA

Reputation: 54247

As you (probably) got sparse data, go with a sparse matrix. Here's an example for 50000 sparse example ratings:

library(stringi)
library(Matrix)
set.seed(1)
df <- data.frame(item = stri_rand_strings(50000, 4))
df$user <- as.factor(1:nrow(df))
df$rating <- sample(1:10, nrow(df), T)
m <- sparseMatrix(
  i = as.integer(df$user), 
  j = as.integer(df$item), 
  x = df$rating, 
  dimnames = list(levels(df$user), levels(df$item))
)

Upvotes: 1

Related Questions