Reputation: 179
I'm strugguling on a problem for few days, concerning the use of group_by() and summarise(). I have nutritionnal data similar to this data set:
library(tidyverse)
myData <- tibble(id = factor(c(rep("1", 5), rep("2", 4), rep("3", 6), rep("4", 5))),
gender = factor(c(rep("M", 5), rep("F", 4), rep("F", 6), rep("M", 5))),
age = c(rep("20-29", 5), rep("20-29", 4), rep("40-49", 6), rep("30-39", 5)),
bmi = c(rep("normal", 5), rep("normal", 4), rep("overweighted", 6), rep("underweighted", 5)),
food = factor(c("A", "A", "B", "C", "D", "D", "D", "A", "A", "B", "A", "B", "C", "C", "B", "D", "C", "E", "E", "A")),
food_class = factor(c("sweet", "sweet", "salty", "bitter", "acid", "acid", "acid", "sweet", "sweet",
"salty", "sweet", "salty", "bitter", "bitter", "salty", "acid", "bitter",
"Other", "Other", "sweet")),
quantity = c(25, 10, 15, 5, 15, 15, 30, 15, 5, 5, 10, 30, 15, 30, 10, 5, 5, 10, 15, 25))
myData %>%
group_by(id,food, gender, food_class) %>%
summarise(sum_quantity = sum(quantity)) %>%
ungroup()%>%
complete(id, food, food_class, fill = list(sum_quantity = 0))%>%
group_by()
What I get is:
# A tibble: 100 x 5
id food food_class gender sum_quantity
<fct> <fct> <fct> <fct> <dbl>
1 1 A acid NA 0
2 1 A bitter NA 0
3 1 A Other NA 0
4 1 A salty NA 0
5 1 A sweet M 35
6 1 B acid NA 0
7 1 B bitter NA 0
8 1 B Other NA 0
9 1 B salty M 15
10 1 B sweet NA 0
# … with 90 more rows
I want to analyse the nutritional data of my data set and evaluate the food consumption of each food_class by summing the quantity eaten by people. For that I need to keep the zero counts in the mean calculation otherwise it would be biaised. But I also want to keep informations like the gender or the age group, so that I can determine the pattern of food consumption for each gender, age etc.
With .drop = FALSE, I get aberrant combinations of my variables since every id will be combined with both genders, even a given id have a given gender. When I use complete(), I get a lot of NA and this makes the analysis complicate because I can't use the fill argument for columns where the values depend on gender or age for example.
Any ideas on how to solve my problem? Thanks a lot.
Upvotes: 1
Views: 2508
Reputation: 35594
Use nesting()
in complete()
to keep combinations of values that appear in the data.
myData %>%
group_by_at(vars(-quantity)) %>%
summarise(sum_quantity = sum(quantity)) %>%
ungroup %>%
complete(nesting(id, gender, age, bmi),
nesting(food, food_class),
fill = list(sum_quantity = 0))
# # A tibble: 20 x 7
# id gender age bmi food food_class sum_quantity
# <fct> <fct> <chr> <chr> <fct> <fct> <dbl>
# 1 1 M 20-29 normal A sweet 35
# 2 1 M 20-29 normal B salty 15
# 3 1 M 20-29 normal C bitter 5
# 4 1 M 20-29 normal D acid 15
# 5 1 M 20-29 normal E Other 0
# 6 2 F 20-29 normal A sweet 20
# 7 2 F 20-29 normal B salty 0
# 8 2 F 20-29 normal C bitter 0
# 9 2 F 20-29 normal D acid 45
# 10 2 F 20-29 normal E Other 0
# 11 3 F 40-49 overweighted A sweet 10
# 12 3 F 40-49 overweighted B salty 45
# 13 3 F 40-49 overweighted C bitter 45
# 14 3 F 40-49 overweighted D acid 0
# 15 3 F 40-49 overweighted E Other 0
# 16 4 M 30-39 underweighted A sweet 25
# 17 4 M 30-39 underweighted B salty 0
# 18 4 M 30-39 underweighted C bitter 5
# 19 4 M 30-39 underweighted D acid 5
# 20 4 M 30-39 underweighted E Other 25
Upvotes: 1