Reputation: 51
I have the following df that shows the food some people eat for a day.
df = data.frame("Name" = c("Brian", "Brian", "Brian",
"Alice", "Alice", "Alice",
"Paul", "Paul", "Paul",
"Clair", "Clair", "Clair"),
"Meal" = c("Breakfast", "Lunch", "Dinner",
"Breakfast", "Lunch", "Dinner",
"Breakfast", "Lunch", "Dinner",
"Breakfast", "Lunch", "Dinner"),
"Food" = c("Waffle", "Chicken", "Steak",
"Waffle", "Soup", "Steak",
"Waffle", "Chicken", "Chicken",
"Waffle", "Soup", "Chicken")
I want to find a food that was eaten by 100% of people, a food that was eaten by 75% of people, and a food that was eaten by 50% of people. In this case Waffle was eaten by everyone, chicken was eaten by 75% of people, and soup/steak was eaten by 50% of people.
EDIT:
Expected Output: The percentage of people who ate each food
Waffle - 100%
Chicken - 75%
Steak - 50%
Soup - 50% .
Upvotes: 1
Views: 100
Reputation: 193527
Here's an approach that uses table
:
x <- ((with(df, table(Food, Name)) >= 1) + 0)
## OR x <- table(unique(df[, c("Food", "Name")]))
x
# Name
# Food Alice Brian Clair Paul
# Chicken 0 1 1 1
# Soup 1 0 1 0
# Steak 1 1 0 0
# Waffle 1 1 1 1
rowSums(x)/ncol(x)
# Chicken Soup Steak Waffle
# 0.75 0.50 0.50 1.00
Upvotes: 2
Reputation: 3923
library(dplyr)
df %>%
distinct(Name, Food) %>%
group_by(Food) %>%
summarise(WhatPercent = n() / nlevels(as.factor(.$Food))) %>%
arrange(desc(WhatPercent)) %>%
mutate(WhatPercent = paste0(WhatPercent * 100, "%"))
#> `summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 2
Food WhatPercent
<chr> <chr>
1 Waffle 100%
2 Chicken 75%
3 Soup 50%
4 Steak 50%
Your data
df <- data.frame("Name" = c("Brian", "Brian", "Brian",
"Alice", "Alice", "Alice",
"Paul", "Paul", "Paul",
"Clair", "Clair", "Clair"),
"Meal" = c("Breakfast", "Lunch", "Dinner",
"Breakfast", "Lunch", "Dinner",
"Breakfast", "Lunch", "Dinner",
"Breakfast", "Lunch", "Dinner"),
"Food" = c("Waffle", "Chicken", "Steak",
"Waffle", "Soup", "Steak",
"Waffle", "Chicken", "Chicken",
"Waffle", "Soup", "Chicken")
)
Upvotes: 2
Reputation: 7385
You can use dplyr
and janitor
:
library(dplyr)
library(janitor)
df %>%
tabyl(Food, Name) %>%
mutate_if(is.numeric, ~ ifelse(. >= 1, 1, 0)) %>%
mutate(n = length(.) - 1) %>%
adorn_totals('col') %>%
mutate(Percent = paste0((Total - n)/n*100, "%")) %>%
select(Food, Percent)
This gives you:
Food Percent
Chicken 75%
Soup 50%
Steak 50%
Waffle 100%
You can also change the last select
argument to select(-c(n, Total))
if you want to keep counts for each person:
Food Alice Brian Clair Paul Percent
Chicken 0 1 1 1 75%
Soup 1 0 1 0 50%
Steak 1 1 0 0 50%
Waffle 1 1 1 1 100%
Upvotes: 1
Reputation: 4358
Edit: With expected output explained
apply(aggregate(Food ~ Name, df, table)[-1],2, function(x) sum(x!=0)/length(x))*100
Food.Chicken Food.Soup Food.Steak Food.Waffle
75 50 50 100
Old Answers
You should give an expected output as this question is unclear. Here is some code to rearrange your data into a form you may find more suitable for calculated statistics.
aggregate(Food ~ Meal, df, table)
Meal Food.Chicken Food.Soup Food.Steak Food.Waffle
1 Breakfast 0 0 0 4
2 Dinner 2 0 2 0
3 Lunch 2 2 0 0
to find the most popular Food at each meal
Modes <- function(x) {
ux <- unique(x)
tab <- tabulate(match(x, ux))
ux[tab == max(tab)]
}
aggregate(Food ~ Meal, df, function(x) levels(x)[Modes(x)] )
Meal Food
1 Breakfast Waffle
2 Dinner Steak, Chicken
3 Lunch Chicken, Soup
Upvotes: 0
Reputation: 21400
Is this what you want?
apply(aggregate(Food ~ Name, df, function(x) ifelse(table(x) == 0, 0, 1))[-1], 2, sum)
Food.Chicken Food.Soup Food.Steak Food.Waffle
3 2 2 4
Or would you prefer this?
apply(aggregate(Food ~ Name, df, function(x) ifelse(table(x) == 0, 0, 1))[-1], 2,
function(x) ifelse(sum(x) == length(unique(df$Name)), "100%",
ifelse(sum(x) == length(unique(df$Name)) - 1, "75%",
ifelse(sum(x) == length(unique(df$Name)) - 2, "50%",
ifelse(sum(x) == length(unique(df$Name)) - 3, "25%", "0%")))))
Food.Chicken Food.Soup Food.Steak Food.Waffle
"75%" "50%" "50%" "100%"
Upvotes: 0