Reputation: 61
I have a dataset with 1000 samples where I try to understand the diet pattern of students. I want to understand, how many have taken only: 1. only apple 2. only Banana 3. Only Orange 4. All three fruits 5. Apple + Banana 6. apple + orange 7. banana+ orange
Upvotes: 0
Views: 248
Reputation: 102625
Here is a base R option using table
+ paste
as.data.frame(
table(
trimws(
do.call(
paste,
as.data.frame(
ifelse(df[-1] > 0,
names(df[-1])[col(df[-1])],
""
)
)
)
)
)
)
which gives
Var1 Freq
1 apple 5
2 apple orange 2
3 apple banana orange 1
4 banana 3
5 orange 2
Or
as.data.frame(
table(
apply(
as.data.frame(ifelse(df[-1] > 0, names(df[-1])[col(df[-1])], NA)),
1,
function(x) toString(na.omit(x))
)
)
)
which gives
Var1 Freq
1 apple 5
2 apple, banana, orange 1
3 apple, orange 2
4 banana 3
5 orange 2
Data
df <- data.frame(
student = 1:13,
apple = c(0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0),
banana = c(1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1),
orange = c(0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0)
)
Upvotes: 0
Reputation: 79338
df %>%
pivot_longer(-student_id) %>%
group_by(student_id)%>%
summarise(name = toString(name[value>0]))%>%
count(name)
# A tibble: 5 x 2
name n
<chr> <int>
1 Apple 5
2 Apple, Banana, orange 1
3 Apple, orange 2
4 Banana 3
5 orange 2
Upvotes: 2
Reputation: 7297
You could do:
library(tidyverse)
data <-
tibble(student = c(1,2,3,4,5),
apple = c(1,0,0,1,1),
banana = c(0,0,1,0,1),
orange = c(0,1,0,1,1))
data |>
pivot_longer(-student, names_to = "fruit") |>
filter(value == 1) |>
group_by(student) |>
summarise(fruit = paste(fruit, collapse = "+")) |>
count(fruit)
Output:
# A tibble: 5 × 2
fruit n
<chr> <int>
1 apple 1
2 apple+banana+orange 1
3 apple+orange 1
4 banana 1
5 orange 1
All combinations will show up using the full data.
Upvotes: 0