Reputation: 23
I have a set of ratings for 45000 users and 40 odd movies. I need to predict new ratings for each user based on their pearson correlation with other users. I also need to store the set of similar users and their similarities for each user-movie combination.I am using the foreach package to execute the loops in parallel. The code that I have managed to write is this:
library(foreach)
x <- matrix(rnorm(1:1000), nrow = 100 , ncol =10 )
df = list()
# correlation matrix
cor_mat <- cor(t(x))
cor_mat = abs(cor_mat)
# similarity limits
upper = 1
lower = 0.04
# Initiating parallel environment
cl = makeCluster(3)
registerDoParallel(cl)
res <- foreach(i = 1:nrow(x) , .combine = rbind,.packages= c('base','foreach')) %dopar%{
foreach(j = 1:ncol(x) , .combine = c, .packages = c('base','foreach')) %do%{
sim_user = which(cor_mat[i,] >= lower & cor_mat[i,] < upper)
bx = as.numeric(t(x[sim_user,j]) %*%
cor_mat[sim_user,j]/sum(cor_mat[sim_user,j]))
df[[length(df)+1]] = data.frame(i,j,sim_user,cor_mat[sim_user,j])
return(bx)
}
}
stopCluster(cl)
I am able to accomplish half of my task i.e. creating a matrix of predicted ratings from the foreach output 'res'. But my list df where I am appending the list of similar users is empty at the end of the foreach loop.
What customized combine function can be written to output both the matrix of predicted ratings and the list of similar users?
Upvotes: 2
Views: 3570
Reputation: 680
For multiple output functions, it is always better to return everything inside a list. In that case, it means that you need to specify your own functions to combine data. Here, I return two elements each time: bx and df. My combine functions therefore combine each of those two elements separately and return them in a length-2 list.
combine_custom_j <- function(LL1, LL2) {
bx <- c(LL1$bx, LL2$bx)
dfs <- c(LL1$df, LL2$df)
return(list(bx = bx, df = dfs))
}
combine_custom_i <- function(LL1, LL2) {
bx <- rbind(LL1$bx, LL2$bx)
dfs <- c(LL1$df, LL2$df)
return(list(bx = bx, df = dfs))
}
res <- foreach(i = 1:nrow(x) , .combine = combine_custom_i,.packages= c('base','foreach')) %dopar%{
foreach(j = 1:ncol(x) , .combine = combine_custom_j, .packages = c('base','foreach')) %do%{
sim_user = which(cor_mat[i,] >= lower & cor_mat[i,] < upper)
bx = as.numeric(t(x[sim_user,j]) %*%
cor_mat[sim_user,j]/sum(cor_mat[sim_user,j]))
return(list(bx = bx, df = data.frame(i,j,sim_user,cor_mat[sim_user,j])))
}
}
Although I have returned your data frames in a list like your code suggested, I believe you might want to rbind
them? In that case, you can simply replace the c(LL1$df, LL2$df)
by rbind(LL1$df, LL2$df)
in both combine functions.
Upvotes: 4