Reputation: 741
I have a data frame that I want to group by two variables, and then summarize the total and average.
I tried this on my data, which is correct.
df %>%
group_by(date, group) %>%
summarise(
weight = sum(ind_weigh) ,
total_usage = sum(total_usage_min) ,
Avg_usage = total_usage / weight) %>%
ungroup()
It returns this data frame:
df <- tibble::tribble(
~date, ~group, ~weight, ~total_usage, ~Avg_usage,
20190201, 0, 450762, 67184943, 149,
20190201, 1, 2788303, 385115718, 138,
20190202, 0, 483959, 60677765, 125,
20190202, 1, 2413699, 311226351, 129,
20190203, 0, 471189, 59921762, 127,
20190203, 1, 2143811, 277425186, 129,
20190204, 0, 531020, 83695977, 158,
20190204, 1, 2640087, 403200829, 153
)
I am wondering how can I add another variable in my script to get the avg_usage_total(for both group 0 and group 1) as well.
Expected result:
ex, first row --> (67184943 / (450762 + 2788303) = 20.7
date group rech total_usage Avg_usage Avg_usage_total
20190201 0 450762 67184943 149 20.7
20190201 1 2788303 385115718 138 118.9
Upvotes: 2
Views: 518
Reputation: 581
You can do that using mutate
and group_by
if necessary.
library(tidyverse)
# generate dataset
(df <- tibble(
date = c(rep(Sys.Date(), 10), rep(Sys.Date() - 1, 10)),
group = rbinom(20, 1, 0.5),
rech = runif(20),
weight = runif(20),
total_usage = runif(20)
))
# A tibble: 20 x 5
date group rech weight total_usage
<date> <int> <dbl> <dbl> <dbl>
1 2019-03-10 0 0.985 0.831 0.963
2 2019-03-10 1 0.178 0.990 0.676
3 2019-03-10 1 0.505 0.697 0.152
4 2019-03-10 1 0.416 0.165 0.824
5 2019-03-10 0 0.554 0.790 0.974
# step 1 of analysis
(df <- df %>%
group_by(date, group) %>%
summarise(rech = sum(rech),
weight = sum(weight),
total_usage = sum(total_usage)) %>%
mutate(Avg_usage = total_usage / weight))
# A tibble: 4 x 6
# Groups: date [2]
date group rech weight total_usage Avg_usage
<date> <int> <dbl> <dbl> <dbl> <dbl>
1 2019-03-09 0 3.29 4.82 3.03 0.628
2 2019-03-09 1 1.45 1.22 1.16 0.954
3 2019-03-10 0 1.54 1.62 1.94 1.20
4 2019-03-10 1 3.15 4.55 4.63 1.02
# step 2 of analysis
df %>%
group_by(date) %>% # only necessary if you want to compute Avg_usage_total by date
mutate(Avg_usage_total = total_usage / sum(rech)) %>% # total_usage is taken by row, sum is taken for the entire column
ungroup()
# A tibble: 4 x 7
date group rech weight total_usage Avg_usage Avg_usage_total
<date> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-03-09 0 3.29 4.82 3.03 0.628 0.639
2 2019-03-09 1 1.45 1.22 1.16 0.954 0.246
3 2019-03-10 0 1.54 1.62 1.94 1.20 0.413
4 2019-03-10 1 3.15 4.55 4.63 1.02 0.986
Upvotes: 3