Reputation: 1549
I am attempting to run a Collaborative Filtering (CF) algorithm on a "User-Item-Rating" data. My data is in a long format i.e. each row has data for a User rating a specific item. I need to convert this into a "User-Item" matrix before I can apply a CF algorithm on it.
I am using the spread
function from the tidyr
package for this task. But given that I have more than 50k unique items, the resulting dataframe would be huge. R is unable to execute this (on my local machine) and throws up the "cannot allocate vector of size" error.
What's the best way to deal with this? Some of the options I tried exploring, but was unable to get them to work:
recommenderlab
has an option to deal with this. But I could not see any option for that.Any help will be greatly appreciated.
Thanks!
Upvotes: 0
Views: 1708
Reputation: 54247
As you (probably) got sparse data, go with a sparse matrix. Here's an example for 50000 sparse example ratings:
library(stringi)
library(Matrix)
set.seed(1)
df <- data.frame(item = stri_rand_strings(50000, 4))
df$user <- as.factor(1:nrow(df))
df$rating <- sample(1:10, nrow(df), T)
m <- sparseMatrix(
i = as.integer(df$user),
j = as.integer(df$item),
x = df$rating,
dimnames = list(levels(df$user), levels(df$item))
)
Upvotes: 1