yupper
yupper

Reputation: 65

Merge rows depending on 2 column values

region.  age.   pop 
SSC21184    0   209
SSC21184    1   195
SSC21184    2   242
SSC21184    3   248
SSC21185    0   231
SSC21185    1   287
SSC21185    2   268
SSC21185    3   257

I'm looking to:

so it should look something like this:

region.  age_group.   pop 
SSC21184    <2       404
SSC21184    >=2      490
SSC21185    <2       518
SSC21185    >=2      524

I've attempted using tapply(df$pop, df$agegroup, FUN = mean) %>% as.data.frame(), however I continue to get the error: arguments must have same length

Edit: If possible, how would I be able to plot the population per age group per region? As for example, a stacked bar graph? Thank you!

Upvotes: 1

Views: 62

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389275

If you have only two age groups to change we can use ifelse :

library(dplyr)

df %>%
  group_by(region, age = ifelse(age >=2, '>=2', '<2')) %>%
  summarise(sum = sum(pop))


#   region   age     sum
#  <chr>    <fct> <int>
#1 SSC21184 < 2     404
#2 SSC21184 >=2     490
#3 SSC21185 < 2     518
#4 SSC21185 >=2     525

A more general solution would be with cut if you have large number of age groups.

df %>%
  group_by(region, age = cut(age, breaks = c(-Inf, 1, Inf), 
                   labels = c('< 2', '>=2'))) %>%
  summarise(sum = sum(pop))

We can use the same logic in tapply as well.

with(df, tapply(pop, list(region, ifelse(age >=2, '>=2', '<2')), sum))

#         <2 >=2
#SSC21184 404 490
#SSC21185 518 525

Upvotes: 3

Related Questions