mvanaman
mvanaman

Reputation: 333

Across several columns, count instances of pairs

I want to count pairs across several columns. That is, with more than two columns, count the number of times particular value pairs occur in the same row.

Say I asked some people whether they liked different kinds of food, and they could answer "yes" or "no". I ended up with this data set:

foods <- 
  data.frame(
    fruit = c("yes", "yes", "no"),
    veg = c("yes", "yes", "yes"),
    meat = c("yes", "no", "yes")
  )
foods

I'd like to count the number of times any two foods received a "yes". I hope to end up with something like this:

desired <- 
  data.frame(
    pair.1 = c("fruit", "fruit", "veg"),
    pair.2 = c("veg", "meat", "meat"),
    Freq = c(2, 1, 2)
  )
desired

This could also work:

desired.2 <- 
  data.frame(
    pair. = c("fruit, veg", "fruit, meat", "veg, meat"),
    Freq = c(2, 1, 2)
  )
desired.2

If possible, I hope I could use a solution to eventually do the same for combinations of 3, 4, and so on (my actual data has more than 3 columns). What's a good solution to this, preferably using dplyr?

Thank you in advance for your help!

Upvotes: 2

Views: 236

Answers (3)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

For pairs, it sounds like you're looking for crossprod:

crossprod(foods == "yes")
#       fruit veg meat
# fruit     2   2    1
# veg       2   3    2
# meat      1   2    2

Note the upper.tri or lower.tri of the result above.

If we treat the result as a table, we can wrap it in data.frame and extract just those specific pairs:

x <- crossprod(foods == "yes")
data.frame(as.table(x))[lower.tri(x), ]
#   Var1  Var2 Freq
# 2  veg fruit    2
# 3 meat fruit    1
# 6 meat   veg    2

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 39858

One dplyr and purrr solution could be:

map_dfr(.x = combn(names(foods), 2, simplify = FALSE),
        ~ foods %>%
         select(.x) %>%
         summarise(pair_1 = .x[1],
                   pair_2 = .x[2],
                   n = sum(rowSums(select(., everything()) == "yes") == 2)))

  pair_1 pair_2 n
1  fruit    veg 2
2  fruit   meat 1
3    veg   meat 2

If you want something more generalizable:

fun <- function(x) {
map_dfr(.x = combn(names(foods), x, simplify = FALSE),
        ~ foods %>%
         select(.x) %>%
         summarise(pairs = paste(.x, collapse = " "),
                   n = sum(rowSums(select(., everything()) == "yes") == x)))
}

fun(2)

       pairs n
1  fruit veg 2
2 fruit meat 1
3   veg meat 2

Upvotes: 3

Sotos
Sotos

Reputation: 51592

You can use combn, i.e.

combn(names(foods), 2, FUN = function(i){i1 <- foods[i]; sum(i1[1] == i1[2])})
#[1] 2 1 2

Upvotes: 1

Related Questions