Reputation: 740
I am toying around with building a recommender system. I have the historical purchases of certain users.
My data looks as
> head(baskets)
# A tibble: 6 x 2
# Groups: user_id [2]
user_id basket
<int> <list>
1 8 <int [21]>
2 8 <int [13]>
3 8 <int [15]>
4 12 <int [22]>
5 12 <int [20]>
6 12 <int [17]>
> baskets$basket[[1]]
[1] 651 1529 2078 6141 6473 9839 14992 16349 17794 20920 21903
[12] 23165 23400 24838 28985 32030 34190 39110 39812 44099 49533
Okay so now I want to remove one item from each basket and save it as the target item, and save the rest of the basket as the new basket. This is to be repeated for all items in the basket. If we had for example a user with user_id = 1 and basket = [1,2,3] we would get
user_id basket target
1 2,3 1
1 1,3 2
1 1,2 3
How can I construct such a data.frame / tibble in an efficient way? I have a solution but it seems to work quite slow, and since I have a large amount of data I would like to find a better solution if possible.
Currently I have
orderdf <- data.frame(user_id = integer(), basket = list(), target =
integer())
for(k in 1:dim(baskets)[1]){
print(k)
currbasket <- baskets$basket[[k]]
currbaskets <- lapply(1:length(currbasket), function(i) currbasket[i])
curruser <- baskets$user_id[k]
for(j in 1:length(currbaskets)){
tempdf <- tibble(user_id = baskets$user_id[k], basket =
list(currbaskets[[j]]), target = currbasket[j])
orderdf <- rbind(orderdf, tempdf)
}
}
Upvotes: 0
Views: 42
Reputation: 1943
First I create myself a reproductable dataset
baskets <- data.frame(user_id = 1:10)
for (i in 1:nrow(df)){
baskets$basket[i] = list(sample(1:100, 3, replace=F))
}
head(baskets)
Next time, please provide a reproductable set!
The next thing to do is to build a function to handle one line:
x = baskets[1,]
x$basket = x$basket[[1]]
require(data.table)
foraline <- function(x){
n_inbasket <- length(unlist(x$basket))
result <- data.table(user_id = rep(x$user_id, n_inbasket))
result$basket <- sapply(1:n_inbasket, function(i){list(unlist(x$basket)[-i])})
result$target <- x$basket
return(result)
}
foraline(x)
Ok and now, we apply it on all lines and reduce it in one data.frame using rbindlist
from data.table package.
require(data.table)
order_basket <- rbindlist(apply(baskets, 1, foraline))
head(order_basket)
Upvotes: 1