Mustafa
Mustafa

Reputation: 87

Count common items between different users

I have a data frame of different users (USER). Each user have a different items (ITEM):

USER DATE ITEM
A 1 alpha
A 1 beta
A 1 gamma
A 2 alpha
A 2 gamma
A 4 beta
A 4 gamma
B 1 alpha
B 1 beta
...

For different combinations of items, of different length, I want to count the number users that have a particular combinations.

Output should be like this:

amount_of_users combination_of_items
2 (alpha,beta)
1 (alpha,gamma)
1 (beta,gamma)
1 (alpha, beta, gamma)

If a user has the item alpha, any 2-,3-,4-item combination counted, he should appear in, as he clearly got the item with other items - but still on the same day.

UPDATE: As DWin correctly stated, it was not clear what I try to achieve. Let one user have items: alpha,beta,gamma. Then this user should be added to each count of any subset of that, meaning the combinations (alpha,beta) (beta,gamma) (alpha,gamma) and at last (alpha, beta, gamma) all get count+1.

In the meantime I thought, that for my main target (I want to see, what are most prominent ITEMS, being added to a specific ITEM, e.g. alpha) I could just count the amount of users, using table and colSums, please find my very bad solution, but indicating the items, being added the most.

levels(x$TARGETGROUP)[c(8,15:17,39,41,57,58,61)] <- c("HOME")
levels(x$TARGETGROUP)
dings <- table(x[,1],x[,3])
str(dings)
#i saw, that the 8th column contains item I needed.
haeuf <- colSums(dings[dings[,8]!=0, ]) 

Upvotes: 0

Views: 2715

Answers (3)

holzben
holzben

Reputation: 1471

I think lala88 wants also the frequencies, one solution could be:

require("combinat")

m<-max(sapply(split(dd, f=dd$USER), function(x) length(unique(x[, "ITEM"]))))

fun<-function(i, dd){
  ind <- sapply(split(dd, f=dd$USER), function(x) length(unique(x[, "ITEM"]))>=i)
  res <- lapply(split(dd, f=dd$USER)[ind],
                 function(x) combn(unique(x$ITEM), i,
                                   simplify = FALSE,
                                   fun=paste, collapse=" "))
table(unlist(res))
}


lapply(2:m, fun, dd=dd)

There is still room for improving my code... so feel free to make an edit...

Upvotes: 0

Vincent Zoonekynd
Vincent Zoonekynd

Reputation: 32351

One can also use the arules package.

# Data
d0<- read.delim( textConnection("USER DATE ITEM
A 1 alpha
A 1 beta
A 1 gamma
A 2 alpha
A 2 gamma
A 4 beta
A 4 gamma
B 1 alpha
B 1 beta"), sep=" ")

# Reshape the data and compute all the itemsets
library(arules)
library(reshape2)
d <- dcast( USER ~ ITEM, data = d0 )[,-1] > 0
r <- apriori( d, par = list(target="frequent itemsets", support = 0, minlen=2) )

# Display the results
inspect(r)
as( r, "data.frame" )
within( as( r, "data.frame" ), { count = support * nrow(d) } )
#                items support count
# 1       {beta,gamma}     0.5     1
# 2      {alpha,gamma}     0.5     1
# 3       {alpha,beta}     1.0     2
# 4 {alpha,beta,gamma}     0.5     1

This does not take the date into account. If you want to separate itemsets by date as well as users:

d <- dcast( USER + DATE ~ ITEM, data = d0, fun.aggregate=length )[,-(1:2)] > 0
r <- apriori( d, par = list(target="frequent itemsets", support = 0, minlen=2) )
within( as( r, "data.frame" ), { count = support * nrow(d) } )
#                items support count
# 1       {alpha,beta}    0.50     2
# 2      {alpha,gamma}    0.50     2
# 3       {beta,gamma}    0.50     2
# 4 {alpha,beta,gamma}    0.25     1

Upvotes: 2

IRTFM
IRTFM

Reputation: 263301

Is this what is needed?

 aggregate(dd$ITEM, 
            by= dd[, c('USER','DATE')], 
            FUN=function(x) list(as.character(x)) )

  USER DATE                  x
1    A    1 alpha, beta, gamma
2    B    1        alpha, beta
3    A    2       alpha, gamma
4    A    4        beta, gamma

(The last paragraph made no sense to me.)

Upvotes: 0

Related Questions