user142632
user142632

Reputation: 61

How to get all possible total combinations in r without repetition?

enter image description here

I have a dataset with 1000 samples where I try to understand the diet pattern of students. I want to understand, how many have taken only: 1. only apple 2. only Banana 3. Only Orange 4. All three fruits 5. Apple + Banana 6. apple + orange 7. banana+ orange

Upvotes: 0

Views: 248

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 102625

Here is a base R option using table + paste

as.data.frame(
  table(
    trimws(
      do.call(
        paste,
        as.data.frame(
          ifelse(df[-1] > 0,
            names(df[-1])[col(df[-1])], 
            ""
          )
        )
      )
    )
  )
)

which gives

                 Var1 Freq
1               apple    5
2       apple  orange    2
3 apple banana orange    1
4              banana    3
5              orange    2

Or

as.data.frame(
  table(
    apply(
      as.data.frame(ifelse(df[-1] > 0, names(df[-1])[col(df[-1])], NA)), 
      1, 
      function(x) toString(na.omit(x))
    )
  )
)

which gives

                   Var1 Freq
1                 apple    5
2 apple, banana, orange    1
3         apple, orange    2
4                banana    3
5                orange    2

Data

df <- data.frame(
  student = 1:13,
  apple = c(0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0),
  banana = c(1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1),
  orange = c(0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0)
)

Upvotes: 0

Onyambu
Onyambu

Reputation: 79338

df %>%
   pivot_longer(-student_id) %>%
   group_by(student_id)%>%
   summarise(name = toString(name[value>0]))%>%
   count(name)

# A tibble: 5 x 2
  name                      n
  <chr>                 <int>
1 Apple                     5
2 Apple, Banana, orange     1
3 Apple, orange             2
4 Banana                    3
5 orange                    2

Upvotes: 2

harre
harre

Reputation: 7297

You could do:

library(tidyverse)

data <-
  tibble(student = c(1,2,3,4,5),
        apple = c(1,0,0,1,1),
        banana = c(0,0,1,0,1),
        orange = c(0,1,0,1,1))

data |>
  pivot_longer(-student, names_to = "fruit") |>
  filter(value == 1) |>
  group_by(student) |>
  summarise(fruit = paste(fruit, collapse = "+")) |>
  count(fruit)

Output:

# A tibble: 5 × 2
  fruit                   n
  <chr>               <int>
1 apple                   1
2 apple+banana+orange     1
3 apple+orange            1
4 banana                  1
5 orange                  1

All combinations will show up using the full data.

Upvotes: 0

Related Questions