user6610645
user6610645

Reputation:

Count all possible pairs per group

I couldn't find a relevant post for this, but apologies if something similar has been asked and answered. If I have the following data:

Market  Product Test
A       1       1
A       2       1
A       3       0
A       4       1
B       1       0
B       2       0
B       3       1
B       4       1
C       1       1
C       2       1
C       3       0
C       4       0

If Test = 1, then a product exists in a market. If I want to count the number of markets where two products co-exist for all product-market combinations, how do I do that? For example, 1&2 = 2; 1&3 = 0; 1&4 = 1.

I already calculated the sum of markets for each product using dplyr:

Answer <- Data %>% 
            group_by(Market) %>% 
            summarise(ProductCount = sum(Test))

Upvotes: 1

Views: 146

Answers (3)

Ryan
Ryan

Reputation: 934

The n.markets function below gives a matrix of the counts you want. The first two rows give the product pair, e.g. the first column shows the pair 1&2, and the third row gives the market count.

dat <- data.frame(Market = c(rep('A',4),rep('B',4),rep('C',4)), Product = rep(1:4,3), Test = c(1,1,0,1,0,0,1,1,1,1,0,0))


n.markets <- function(dat){
    combs  <- combn(unique(dat$Product),2)
    mkts   <- unique(as.vector(dat$Market))
    counts <- rep(0,ncol(combs))
    for(j in 1:ncol(combs)){
        for(i in seq_along(mkts)){
            counts[j] <- counts[j] + (sum(dat[dat$Market==mkts[i] & dat$Product %in% combs[,j], 'Test'])==2)
        }
    }
    rbind(combs,counts)
}

n.markets(dat)

##        [,1] [,2] [,3] [,4] [,5] [,6]
##           1    1    1    2    2    3
##           2    3    4    3    4    4
## counts    2    0    1    0    1    1

Edit: The answer in user20650's comment is much faster

tab = crossprod(table(d[d$Test==1, -3]))
tab[lower.tri(tab, diag=TRUE)] <- NA
reshape2::melt(tab, na.rm=TRUE) 

Upvotes: 0

Sotos
Sotos

Reputation: 51592

Here is an idea using only base R. We aggregate on Market and get all possible combinations when Test == 1. We then use table to count the pairs and arrange them in a data.frame

d1 <- aggregate(Product ~ Market, df[df$Test == 1,], FUN = function(i)combn(i, 2, FUN = toString))
d2 <- as.data.frame(table(unlist(d1$Product)), stringsAsFactors = FALSE)

d2
#  Var1 Freq
#1 1, 2    2
#2 1, 4    1
#3 2, 4    1
#4 3, 4    1

However, If you want to include all pairs non-appearing, then,

n <- setdiff(combn(unique(df$Product), 2, toString), d2$Var1)
rbind(d2, data.frame(Var1 = n, Freq = 0, stringsAsFactors = FALSE))

#  Var1 Freq
#1 1, 2    2
#2 1, 4    1
#3 2, 4    1
#4 3, 4    1
#5 1, 3    0
#6 2, 3    0

Upvotes: 1

IRTFM
IRTFM

Reputation: 263481

Looks like a job for table within rows that have Test == 1:

 (combMkt <- with(  Data[ Data$Test==1, ], table(Market, Product)) )
#-- the starting point of all Test=1 rows:
      Product
Market 1 2 3 4
     A 1 1 0 1
     B 0 0 1 1
     C 1 1 0 0

# Choose the ones with 2 or more Markets
 combMkt[ , which( apply( combMkt, 2, sum) >=2 )]
      Product
Market 1 2 4
     A 1 1 1
     B 0 0 1
     C 1 1 0

Upvotes: 2

Related Questions