Reputation: 65
region. age. pop
SSC21184 0 209
SSC21184 1 195
SSC21184 2 242
SSC21184 3 248
SSC21185 0 231
SSC21185 1 287
SSC21185 2 268
SSC21185 3 257
I'm looking to:
so it should look something like this:
region. age_group. pop
SSC21184 <2 404
SSC21184 >=2 490
SSC21185 <2 518
SSC21185 >=2 524
I've attempted using tapply(df$pop, df$agegroup, FUN = mean) %>% as.data.frame()
, however I continue to get the error: arguments must have same length
Edit: If possible, how would I be able to plot the population per age group per region? As for example, a stacked bar graph? Thank you!
Upvotes: 1
Views: 62
Reputation: 389275
If you have only two age groups to change we can use ifelse
:
library(dplyr)
df %>%
group_by(region, age = ifelse(age >=2, '>=2', '<2')) %>%
summarise(sum = sum(pop))
# region age sum
# <chr> <fct> <int>
#1 SSC21184 < 2 404
#2 SSC21184 >=2 490
#3 SSC21185 < 2 518
#4 SSC21185 >=2 525
A more general solution would be with cut
if you have large number of age groups.
df %>%
group_by(region, age = cut(age, breaks = c(-Inf, 1, Inf),
labels = c('< 2', '>=2'))) %>%
summarise(sum = sum(pop))
We can use the same logic in tapply
as well.
with(df, tapply(pop, list(region, ifelse(age >=2, '>=2', '<2')), sum))
# <2 >=2
#SSC21184 404 490
#SSC21185 518 525
Upvotes: 3