Mili
Mili

Reputation: 77

Counting groups with shared elements

I have a data set which looks like below:

Comp1  Product  Comp2
A       P1      B
A       P2      B
A       P3      B
C       P4      D
C       P2      D
X       P1      Y
X       P2      Y
X       P3      Y

So basically, Comp1 and Comp2 are the companies and Product is the product name that these companies have in common. I want the output to appear something like this:

Product Bundle    Count
P1,P2,P3          2
P2,P4             1

I am new to R and would appreciate any help in this case.

Upvotes: 3

Views: 96

Answers (3)

Brian Davis
Brian Davis

Reputation: 992

If you prefer using base r as I do, this is an idea:

dtb <- table(paste(dd[[1]],dd[[3]]),dd[[2]])
out <- sapply(1:nrow(dtb),function(x) paste(colnames(dtb)[dtb[x,] == 
1],collapse = ","))
table(out)
out
P1,P2,P3    P2,P4 
       2        1

Upvotes: 3

Jaap
Jaap

Reputation: 83215

A solution with data.table:

library(data.table)
setDT(d)[order(Product), Prod.Bundle := toString(Product), by = Comp1
         ][, .(Count = uniqueN(Comp2)), by = Prod.Bundle]

Or another as provided by @Frank in the comments:

setDT(d)[order(Product), toString(Product), by = Comp1
         ][, .(Count = .N), by = .(Prod.Bundle = V1)]

which gives:

   Prod.Bundle Count
1:  P1, P2, P3     2
2:      P2, P4     1

Used data:

d <- read.table(text="Comp1  Product  Comp2
A       P1      B
A       P2      B
A       P3      B
C       P4      D
C       P2      D
X       P1      Y
X       P2      Y
X       P3      Y", header=TRUE, stringsAsFactors=FALSE)

Upvotes: 3

MrFlick
MrFlick

Reputation: 206187

Using dplyr, you can summarize the data then count it. For example

library(dplyr)

dd %>% arrange(Comp1, Product) %>% 
  group_by(Comp1) %>% 
  summarize(bundle=paste(unique(Product), collapse=",")) %>% 
  count(bundle)

#     bundle     n
#      <chr> <int>
# 1 P1,P2,P3     2
# 2    P2,P4     1

with the test data

dd <- read.table(text="Comp1  Product  Comp2
A       P1      B
A       P2      B
A       P3      B
C       P4      D
C       P2      D
X       P1      Y
X       P2      Y
X       P3      Y", header=TRUE)

Upvotes: 4

Related Questions