Harry Palmer
Harry Palmer

Reputation: 478

Count item pairs linked by column value

I'm struggling to solve this problem in R. I have data like this:

item   id
1      500
2      500
2      600
2      700
3      500
3      600

data.frame(item = c(1, 2, 2, 2, 3, 3),
           id = c(500, 500, 600, 700, 500, 600))

And I want to count the number of times a pair of items is linked to the same id. So I want this output:

item1    item2    count
    1        2        1
    2        3        2
    1        3        2

I've tried approaching this with commands like:

x_agg = aggregate(x, by=list(x$id), c)

and then

x_agg_id = lapply(x_agg$item, unique)

thinking that I could then count the occurrence of each item. But the by function seems to create an object of lists, which I don't know how to manipulate. I am hoping there is a simpler way....

Upvotes: 5

Views: 737

Answers (2)

user1317221_G
user1317221_G

Reputation: 15441

# your data
df<-read.table(text="item   id
1      500
2      500
2      600
2      700
3      500
3      600",header=TRUE)


library(tnet)
item_item<-projecting_tm(df, method="sum")
names(item_item)<-c("item1","item2","count")

item_item

  #item1 item2 count
#1     1     2     1
#2     1     3     1
#3     2     1     1
#4     2     3     2
#5     3     1     1
#6     3     2     2

EDIT

how many ids and items do you have? you could always rename things. e.g.

numberitems<-length(unique(df$id))+9000
items<-data.frame(item=unique(df$item),newitems=c(9000:(numberitems-1)))
numberids<-length(unique(df$id))+1000
ids<-data.frame(id=unique(df$id),newids=c(1000:(numberids-1)))
newdf<-merge(df,items,by="item")
newdf<-merge(newdf,ids,by="id")
DF<-data.frame(item=newdf$newitems,id=newdf$newids)

library(tnet)
item_item<-projecting_tm(DF, method="sum")
names(item_item)<-c("item1","item2","count")

then merge back the original names afterwards....

Upvotes: 3

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

I suggest this approach because it's not clear from your example output whether the answer from @user1317221_G is exactly what you are looking for. In that example, the combination 2 3 is counted 4 times, twice for item1 = 2, item2 = 3, and twice for item1 = 3, item2 = 2.

I would try the combn function. It doesn't give you exactly the same output that you're looking for, but can probably be adapted for that purpose.

Here is an example.

  1. Write a basic function that will generate combinations of whatever we give it.

    myfun = function(x) { apply(combn(x, 2), 2, paste, sep="", collapse="") }
    
  2. split() the item column of your data by id and use lapply to generate the combinations within that id.

    temp = split(df$item, df$id)
    # Drop any list items that have only one value--combn won't work there!
    temp = temp[-(which(sapply(temp,function(x) length(x) == 1),
                        arr.ind=TRUE))]
    temp1 = lapply(temp, function(x) myfun(unique(x)))
    
  3. Use unlist and then table to tabulate the frequencies of each combination.

    table(unlist(temp1))
    # 
    # 12 13 23 
    #  1  1  2
    

You can have a data.frame if you prefer.

data.frame(table(unlist(temp)))
#   Var1 Freq
# 1   12    1
# 2   13    1
# 3   23    2

Update

As mentioned, with a little bit more elbow grease, you can use this method to match your desired output too:

myfun = function(x) { apply(combn(x, 2), 2, paste, sep="", collapse=",") }
temp = split(df$item, df$id)
temp = temp[-(which(sapply(temp,function(x) length(x) == 1),
                    arr.ind=TRUE))]
temp1 = lapply(temp, function(x) myfun(unique(x)))
temp1 = data.frame(table(unlist(temp1)))
OUT = data.frame(do.call(rbind, 
                         strsplit(as.character(temp1$Var1), ",")),
                 temp1$Freq)
names(OUT) = c("item1", "item2", "count")
OUT
#   item1 item2 count
# 1     1     2     1
# 2     1     3     1
# 3     2     3     2

Upvotes: 2

Related Questions