Reputation: 333
I want to count pairs across several columns. That is, with more than two columns, count the number of times particular value pairs occur in the same row.
Say I asked some people whether they liked different kinds of food, and they could answer "yes" or "no". I ended up with this data set:
foods <-
data.frame(
fruit = c("yes", "yes", "no"),
veg = c("yes", "yes", "yes"),
meat = c("yes", "no", "yes")
)
foods
I'd like to count the number of times any two foods received a "yes". I hope to end up with something like this:
desired <-
data.frame(
pair.1 = c("fruit", "fruit", "veg"),
pair.2 = c("veg", "meat", "meat"),
Freq = c(2, 1, 2)
)
desired
This could also work:
desired.2 <-
data.frame(
pair. = c("fruit, veg", "fruit, meat", "veg, meat"),
Freq = c(2, 1, 2)
)
desired.2
If possible, I hope I could use a solution to eventually do the same for combinations of 3, 4, and so on (my actual data has more than 3 columns). What's a good solution to this, preferably using dplyr?
Thank you in advance for your help!
Upvotes: 2
Views: 236
Reputation: 193527
For pairs, it sounds like you're looking for crossprod
:
crossprod(foods == "yes")
# fruit veg meat
# fruit 2 2 1
# veg 2 3 2
# meat 1 2 2
Note the upper.tri
or lower.tri
of the result above.
If we treat the result as a table
, we can wrap it in data.frame
and extract just those specific pairs:
x <- crossprod(foods == "yes")
data.frame(as.table(x))[lower.tri(x), ]
# Var1 Var2 Freq
# 2 veg fruit 2
# 3 meat fruit 1
# 6 meat veg 2
Upvotes: 1
Reputation: 39858
One dplyr
and purrr
solution could be:
map_dfr(.x = combn(names(foods), 2, simplify = FALSE),
~ foods %>%
select(.x) %>%
summarise(pair_1 = .x[1],
pair_2 = .x[2],
n = sum(rowSums(select(., everything()) == "yes") == 2)))
pair_1 pair_2 n
1 fruit veg 2
2 fruit meat 1
3 veg meat 2
If you want something more generalizable:
fun <- function(x) {
map_dfr(.x = combn(names(foods), x, simplify = FALSE),
~ foods %>%
select(.x) %>%
summarise(pairs = paste(.x, collapse = " "),
n = sum(rowSums(select(., everything()) == "yes") == x)))
}
fun(2)
pairs n
1 fruit veg 2
2 fruit meat 1
3 veg meat 2
Upvotes: 3
Reputation: 51592
You can use combn
, i.e.
combn(names(foods), 2, FUN = function(i){i1 <- foods[i]; sum(i1[1] == i1[2])})
#[1] 2 1 2
Upvotes: 1