Reputation: 426
I would like to take the sum of a column variable when the other two columns match and then adding a new row with different name for this sum result. I'm able to get the first part but unsure how to add the new row for the result obtained...
Here's my dataframe:
df <- structure(list(measure_name = c("Prevalence", "Prevalence", "Incidence",
"Incidence", "Deaths", "Deaths", "YLLs (Years of Life Lost)",
"YLLs (Years of Life Lost)", "YLDs (Years Lived with Disability)",
"YLDs (Years Lived with Disability)", "DALYs (Disability-Adjusted Life Years)",
"DALYs (Disability-Adjusted Life Years)", "Prevalence", "Prevalence",
"Incidence", "Incidence", "YLDs (Years Lived with Disability)",
"YLDs (Years Lived with Disability)", "DALYs (Disability-Adjusted Life Years)",
"DALYs (Disability-Adjusted Life Years)"), age_name = c("1-4 years",
"5-9 years", "1-4 years", "5-9 years", "1-4 years", "5-9 years",
"1-4 years", "5-9 years", "1-4 years", "5-9 years", "1-4 years",
"5-9 years", "1-4 years", "5-9 years", "1-4 years", "5-9 years",
"1-4 years", "5-9 years", "1-4 years", "5-9 years"), cause_name = c("Asthma",
"Asthma", "Asthma", "Asthma", "Asthma", "Asthma", "Asthma", "Asthma",
"Asthma", "Asthma", "Asthma", "Asthma", "Attention-deficit/hyperactivity disorder",
"Attention-deficit/hyperactivity disorder", "Attention-deficit/hyperactivity disorder",
"Attention-deficit/hyperactivity disorder", "Attention-deficit/hyperactivity disorder",
"Attention-deficit/hyperactivity disorder", "Attention-deficit/hyperactivity disorder",
"Attention-deficit/hyperactivity disorder"), val = c(21809765.44,
33602368.48, 10004723.65, 6417738.685, 6101.934992, 1699.9247,
524901.7761, 138969.73, 879880.8571, 1355302.883, 1404782.633,
1494272.613, 1367581.312, 14033704.42, 1314270.786, 2654899.128,
16774.31306, 171847.3209, 16774.31306, 171847.3209)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
> df
# A tibble: 20 × 4
measure_name age_name cause_name val
<chr> <chr> <chr> <dbl>
1 Prevalence 1-4 years Asthma 21809765.
2 Prevalence 5-9 years Asthma 33602368.
3 Incidence 1-4 years Asthma 10004724.
4 Incidence 5-9 years Asthma 6417739.
5 Deaths 1-4 years Asthma 6102.
6 Deaths 5-9 years Asthma 1700.
7 YLLs (Years of Life Lost) 1-4 years Asthma 524902.
8 YLLs (Years of Life Lost) 5-9 years Asthma 138970.
9 YLDs (Years Lived with Disability) 1-4 years Asthma 879881.
10 YLDs (Years Lived with Disability) 5-9 years Asthma 1355303.
11 DALYs (Disability-Adjusted Life Years) 1-4 years Asthma 1404783.
12 DALYs (Disability-Adjusted Life Years) 5-9 years Asthma 1494273.
13 Prevalence 1-4 years Attention-deficit/hyperactivity disorder 1367581.
14 Prevalence 5-9 years Attention-deficit/hyperactivity disorder 14033704.
15 Incidence 1-4 years Attention-deficit/hyperactivity disorder 1314271.
16 Incidence 5-9 years Attention-deficit/hyperactivity disorder 2654899.
17 YLDs (Years Lived with Disability) 1-4 years Attention-deficit/hyperactivity disorder 16774.
18 YLDs (Years Lived with Disability) 5-9 years Attention-deficit/hyperactivity disorder 171847.
19 DALYs (Disability-Adjusted Life Years) 1-4 years Attention-deficit/hyperactivity disorder 16774.
20 DALYs (Disability-Adjusted Life Years) 5-9 years Attention-deficit/hyperactivity disorder 171847.
Desired output using prevalence of asthma as an example:
measure_name age_name cause_name val
<chr> <chr> <chr> <dbl>
1 Prevalence 1-4 years Asthma 21809765.
2 Prevalence 5-9 years Asthma 33602368.
3 Prevalence 1–9 years Asthma 55412133.
What I have tried:
Total <- df %>%
group_by(measure_name, cause_name) %>%
summarise(val = sum(val, na.rm = TRUE)) %>% arrange(cause_name, factor(measure_name, levels = c("Prevalence",
"Incidence",
"Deaths",
"YLLs (Years of Life Lost)",
"YLDs (Years Lived with Disability)",
"DALYs (Disability-Adjusted Life Years)")))
# A tibble: 10 × 3
# Groups: measure_name [6]
measure_name cause_name val
<chr> <chr> <dbl>
1 Prevalence Asthma 55412134.
2 Incidence Asthma 16422462.
3 Deaths Asthma 7802.
4 YLLs (Years of Life Lost) Asthma 663872.
5 YLDs (Years Lived with Disability) Asthma 2235184.
6 DALYs (Disability-Adjusted Life Years) Asthma 2899055.
7 Prevalence Attention-deficit/hyperactivity disorder 15401286.
8 Incidence Attention-deficit/hyperactivity disorder 3969170.
9 YLDs (Years Lived with Disability) Attention-deficit/hyperactivity disorder 188622.
10 DALYs (Disability-Adjusted Life Years) Attention-deficit/hyperactivity disorder 188622.
I get the total but would like to create new row with new age_name that is 1–9 years (pls see above desired output using prevalence of asthma as an example). Can someone help pls? Thanks.
Upvotes: 0
Views: 60
Reputation: 26
If I understand your question correctly, you can just sum val
, change the value of age_name
, and then use bind_rows
to tie it up with the original data set. After that it is just a matter of organizing the rows to appear in the way you want it. Here is such a solution:
df %>%
group_by(measure_name, cause_name) %>%
summarise(val = sum(val, na.rm = TRUE), age_name = "1-9 years", .groups = "drop") %>%
bind_rows(df) %>%
arrange(cause_name, factor(measure_name, levels = c("Prevalence",
"Incidence",
"Deaths",
"YLLs (Years of Life Lost)",
"YLDs (Years Lived with Disability)",
"DALYs (Disability-Adjusted Life Years)")),
age_name = factor(age_name, levels = c("1-4 years", "5-9 years", "1-9 years")))
Upvotes: 1