Reputation: 51

Find column values that are shared among other column values R

I have the following df that shows the food some people eat for a day.

df = data.frame("Name" = c("Brian", "Brian", "Brian",
                       "Alice", "Alice", "Alice",
                       "Paul", "Paul", "Paul",
                       "Clair", "Clair", "Clair"),
            "Meal" = c("Breakfast", "Lunch", "Dinner",
                       "Breakfast", "Lunch", "Dinner",
                       "Breakfast", "Lunch", "Dinner",
                       "Breakfast", "Lunch", "Dinner"),
            "Food" = c("Waffle", "Chicken", "Steak",
                       "Waffle", "Soup", "Steak",
                       "Waffle", "Chicken", "Chicken",
                       "Waffle", "Soup", "Chicken")

I want to find a food that was eaten by 100% of people, a food that was eaten by 75% of people, and a food that was eaten by 50% of people. In this case Waffle was eaten by everyone, chicken was eaten by 75% of people, and soup/steak was eaten by 50% of people.

EDIT:
Expected Output: The percentage of people who ate each food
Waffle - 100%
Chicken - 75%
Steak - 50%
Soup - 50% .

Upvotes: 1

Answers (5)

A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

Here's an approach that uses table:

x <- ((with(df, table(Food, Name)) >= 1) + 0)
## OR x <- table(unique(df[, c("Food", "Name")]))
x
#          Name
# Food      Alice Brian Clair Paul
#   Chicken     0     1     1    1
#   Soup        1     0     1    0
#   Steak       1     1     0    0
#   Waffle      1     1     1    1
rowSums(x)/ncol(x)
# Chicken    Soup   Steak  Waffle 
#    0.75    0.50    0.50    1.00

Upvotes: 2

Chuck P

Reputation: 3923

library(dplyr)

df %>% 
  distinct(Name, Food) %>% 
  group_by(Food) %>% 
  summarise(WhatPercent = n() / nlevels(as.factor(.$Food))) %>%
  arrange(desc(WhatPercent)) %>%
  mutate(WhatPercent = paste0(WhatPercent * 100, "%"))


#> `summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 2
  Food    WhatPercent
  <chr>   <chr>      
1 Waffle  100%       
2 Chicken 75%        
3 Soup    50%        
4 Steak   50%

Your data

df <- data.frame("Name" = c("Brian", "Brian", "Brian",
                           "Alice", "Alice", "Alice",
                           "Paul", "Paul", "Paul",
                           "Clair", "Clair", "Clair"),
                "Meal" = c("Breakfast", "Lunch", "Dinner",
                           "Breakfast", "Lunch", "Dinner",
                           "Breakfast", "Lunch", "Dinner",
                           "Breakfast", "Lunch", "Dinner"),
                "Food" = c("Waffle", "Chicken", "Steak",
                           "Waffle", "Soup", "Steak",
                           "Waffle", "Chicken", "Chicken",
                           "Waffle", "Soup", "Chicken")
)

Upvotes: 2

Matt

Reputation: 7385

You can use dplyr and janitor:

library(dplyr)
library(janitor)

df %>% 
  tabyl(Food, Name) %>% 
  mutate_if(is.numeric, ~ ifelse(. >= 1, 1, 0)) %>% 
  mutate(n = length(.) - 1) %>% 
  adorn_totals('col') %>% 
  mutate(Percent = paste0((Total - n)/n*100, "%")) %>% 
  select(Food, Percent)

This gives you:

    Food Percent
 Chicken     75%
    Soup     50%
   Steak     50%
  Waffle    100%

You can also change the last select argument to select(-c(n, Total)) if you want to keep counts for each person:

    Food Alice Brian Clair Paul Percent
 Chicken     0     1     1    1     75%
    Soup     1     0     1    0     50%
   Steak     1     1     0    0     50%
  Waffle     1     1     1    1    100%

Upvotes: 1

Daniel O

Reputation: 4358

Edit: With expected output explained

apply(aggregate(Food ~ Name, df, table)[-1],2, function(x) sum(x!=0)/length(x))*100

Food.Chicken    Food.Soup   Food.Steak  Food.Waffle 
          75           50           50          100

Old Answers

You should give an expected output as this question is unclear. Here is some code to rearrange your data into a form you may find more suitable for calculated statistics.

aggregate(Food ~ Meal, df, table)

      Meal Food.Chicken Food.Soup Food.Steak Food.Waffle
1 Breakfast            0         0          0           4
2    Dinner            2         0          2           0
3     Lunch            2         2          0           0

to find the most popular Food at each meal

Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}

aggregate(Food ~ Meal, df, function(x) levels(x)[Modes(x)] )

       Meal           Food
1 Breakfast         Waffle
2    Dinner Steak, Chicken
3     Lunch  Chicken, Soup

Credit for Modes Function

Upvotes: 0

Chris Ruehlemann

Reputation: 21400

Is this what you want?

apply(aggregate(Food ~ Name, df, function(x) ifelse(table(x) == 0, 0, 1))[-1], 2, sum)
Food.Chicken    Food.Soup   Food.Steak  Food.Waffle 
           3            2            2            4

Or would you prefer this?

apply(aggregate(Food ~ Name, df, function(x) ifelse(table(x) == 0, 0, 1))[-1], 2, 
      function(x)  ifelse(sum(x) == length(unique(df$Name)), "100%",  
                          ifelse(sum(x) == length(unique(df$Name)) - 1, "75%",
                                 ifelse(sum(x) == length(unique(df$Name)) - 2, "50%", 
                                        ifelse(sum(x) == length(unique(df$Name)) - 3, "25%", "0%")))))
Food.Chicken    Food.Soup   Food.Steak  Food.Waffle 
       "75%"        "50%"        "50%"       "100%"

Upvotes: 0

Find column values that are shared among other column values R

Answers (5)

Related Questions