Reputation: 478
I'm struggling to solve this problem in R. I have data like this:
item id
1 500
2 500
2 600
2 700
3 500
3 600
data.frame(item = c(1, 2, 2, 2, 3, 3),
id = c(500, 500, 600, 700, 500, 600))
And I want to count the number of times a pair of items is linked to the same id. So I want this output:
item1 item2 count
1 2 1
2 3 2
1 3 2
I've tried approaching this with commands like:
x_agg = aggregate(x, by=list(x$id), c)
and then
x_agg_id = lapply(x_agg$item, unique)
thinking that I could then count the occurrence of each item. But the by
function seems to create an object of lists, which I don't know how to manipulate. I am hoping there is a simpler way....
Upvotes: 5
Views: 737
Reputation: 15441
# your data
df<-read.table(text="item id
1 500
2 500
2 600
2 700
3 500
3 600",header=TRUE)
library(tnet)
item_item<-projecting_tm(df, method="sum")
names(item_item)<-c("item1","item2","count")
item_item
#item1 item2 count
#1 1 2 1
#2 1 3 1
#3 2 1 1
#4 2 3 2
#5 3 1 1
#6 3 2 2
EDIT
how many ids and items do you have? you could always rename things. e.g.
numberitems<-length(unique(df$id))+9000
items<-data.frame(item=unique(df$item),newitems=c(9000:(numberitems-1)))
numberids<-length(unique(df$id))+1000
ids<-data.frame(id=unique(df$id),newids=c(1000:(numberids-1)))
newdf<-merge(df,items,by="item")
newdf<-merge(newdf,ids,by="id")
DF<-data.frame(item=newdf$newitems,id=newdf$newids)
library(tnet)
item_item<-projecting_tm(DF, method="sum")
names(item_item)<-c("item1","item2","count")
then merge back the original names afterwards....
Upvotes: 3
Reputation: 193527
I suggest this approach because it's not clear from your example output whether the answer from @user1317221_G is exactly what you are looking for. In that example, the combination 2 3
is counted 4 times, twice for item1 = 2, item2 = 3
, and twice for item1 = 3, item2 = 2
.
I would try the combn
function. It doesn't give you exactly the same output that you're looking for, but can probably be adapted for that purpose.
Here is an example.
Write a basic function that will generate combinations of whatever we give it.
myfun = function(x) { apply(combn(x, 2), 2, paste, sep="", collapse="") }
split()
the item
column of your data by id
and use lapply
to generate the combinations within that id
.
temp = split(df$item, df$id)
# Drop any list items that have only one value--combn won't work there!
temp = temp[-(which(sapply(temp,function(x) length(x) == 1),
arr.ind=TRUE))]
temp1 = lapply(temp, function(x) myfun(unique(x)))
Use unlist
and then table
to tabulate the frequencies of each combination.
table(unlist(temp1))
#
# 12 13 23
# 1 1 2
You can have a data.frame
if you prefer.
data.frame(table(unlist(temp)))
# Var1 Freq
# 1 12 1
# 2 13 1
# 3 23 2
As mentioned, with a little bit more elbow grease, you can use this method to match your desired output too:
myfun = function(x) { apply(combn(x, 2), 2, paste, sep="", collapse=",") }
temp = split(df$item, df$id)
temp = temp[-(which(sapply(temp,function(x) length(x) == 1),
arr.ind=TRUE))]
temp1 = lapply(temp, function(x) myfun(unique(x)))
temp1 = data.frame(table(unlist(temp1)))
OUT = data.frame(do.call(rbind,
strsplit(as.character(temp1$Var1), ",")),
temp1$Freq)
names(OUT) = c("item1", "item2", "count")
OUT
# item1 item2 count
# 1 1 2 1
# 2 1 3 1
# 3 2 3 2
Upvotes: 2