Reputation: 87
I have a data frame of different users (USER). Each user have a different items (ITEM):
USER DATE ITEM
A 1 alpha
A 1 beta
A 1 gamma
A 2 alpha
A 2 gamma
A 4 beta
A 4 gamma
B 1 alpha
B 1 beta
...
For different combinations of items, of different length, I want to count the number users that have a particular combinations.
Output should be like this:
amount_of_users combination_of_items
2 (alpha,beta)
1 (alpha,gamma)
1 (beta,gamma)
1 (alpha, beta, gamma)
If a user has the item alpha, any 2-,3-,4-item combination counted, he should appear in, as he clearly got the item with other items - but still on the same day.
UPDATE: As DWin correctly stated, it was not clear what I try to achieve. Let one user have items: alpha,beta,gamma. Then this user should be added to each count of any subset of that, meaning the combinations (alpha,beta) (beta,gamma) (alpha,gamma) and at last (alpha, beta, gamma) all get count+1.
In the meantime I thought, that for my main target (I want to see, what are most prominent ITEMS, being added to a specific ITEM, e.g. alpha) I could just count the amount of users, using table and colSums, please find my very bad solution, but indicating the items, being added the most.
levels(x$TARGETGROUP)[c(8,15:17,39,41,57,58,61)] <- c("HOME")
levels(x$TARGETGROUP)
dings <- table(x[,1],x[,3])
str(dings)
#i saw, that the 8th column contains item I needed.
haeuf <- colSums(dings[dings[,8]!=0, ])
Upvotes: 0
Views: 2715
Reputation: 1471
I think lala88 wants also the frequencies, one solution could be:
require("combinat")
m<-max(sapply(split(dd, f=dd$USER), function(x) length(unique(x[, "ITEM"]))))
fun<-function(i, dd){
ind <- sapply(split(dd, f=dd$USER), function(x) length(unique(x[, "ITEM"]))>=i)
res <- lapply(split(dd, f=dd$USER)[ind],
function(x) combn(unique(x$ITEM), i,
simplify = FALSE,
fun=paste, collapse=" "))
table(unlist(res))
}
lapply(2:m, fun, dd=dd)
There is still room for improving my code... so feel free to make an edit...
Upvotes: 0
Reputation: 32351
One can also use the arules
package.
# Data
d0<- read.delim( textConnection("USER DATE ITEM
A 1 alpha
A 1 beta
A 1 gamma
A 2 alpha
A 2 gamma
A 4 beta
A 4 gamma
B 1 alpha
B 1 beta"), sep=" ")
# Reshape the data and compute all the itemsets
library(arules)
library(reshape2)
d <- dcast( USER ~ ITEM, data = d0 )[,-1] > 0
r <- apriori( d, par = list(target="frequent itemsets", support = 0, minlen=2) )
# Display the results
inspect(r)
as( r, "data.frame" )
within( as( r, "data.frame" ), { count = support * nrow(d) } )
# items support count
# 1 {beta,gamma} 0.5 1
# 2 {alpha,gamma} 0.5 1
# 3 {alpha,beta} 1.0 2
# 4 {alpha,beta,gamma} 0.5 1
This does not take the date into account. If you want to separate itemsets by date as well as users:
d <- dcast( USER + DATE ~ ITEM, data = d0, fun.aggregate=length )[,-(1:2)] > 0
r <- apriori( d, par = list(target="frequent itemsets", support = 0, minlen=2) )
within( as( r, "data.frame" ), { count = support * nrow(d) } )
# items support count
# 1 {alpha,beta} 0.50 2
# 2 {alpha,gamma} 0.50 2
# 3 {beta,gamma} 0.50 2
# 4 {alpha,beta,gamma} 0.25 1
Upvotes: 2
Reputation: 263301
Is this what is needed?
aggregate(dd$ITEM,
by= dd[, c('USER','DATE')],
FUN=function(x) list(as.character(x)) )
USER DATE x
1 A 1 alpha, beta, gamma
2 B 1 alpha, beta
3 A 2 alpha, gamma
4 A 4 beta, gamma
(The last paragraph made no sense to me.)
Upvotes: 0