Reputation: 1793
I have a set of data where a transaction id always has at least 1 item from both A and B. I want to calculate frequencies where the items from category A are tabulated with respect to those of category B.
Input looks like:
id = c(1,1,2,2,2,2,3,3,3,3)
cat = c("A","B","A","A","A","B","B","A","A","B")
item = c("Item 1","Item 30","Item 2","Item 3",
"Item 1","Item 30","Item 31","Item 1","Item 2","Item 32")
df = data.frame(id,cat,item)
Id Cat Item
1 A Item 1
1 B Item 30
2 A Item 2
2 A Item 3
2 A Item 1
2 B Item 30
3 B Item 31
3 A Item 1
3 A Item 2
3 B Item 32
The output I'm looking for is
Item30 Item31 Item 32
Item1 2 1 1
Item2 1 1 1
Item3 1
I have a solution where I can loop through the unique values of each category, but is there a cleaner solution which avoids the loop entirely?
Edit - Fixed the example which was missing a row. And to clarify, the category {A,B} relates to which category the item belongs to
Upvotes: 1
Views: 50
Reputation: 26446
You appear to be doing
with(merge(df[cat=='A',],df[cat=='B',],by='id',all=TRUE),
table(droplevels(item.x),droplevels(item.y)))
Item 30 Item 31 Item 32 Item 1 2 1 1 Item 2 1 1 1 Item 3 1 0 0
Upvotes: 3
Reputation: 561
I believe you are looking to group by id and cat.
library(reshape2)
dcast(df,id+cat~item,value.var = "item",fun.aggregate =length)
Upvotes: 0