Mohammad Haddadi
Mohammad Haddadi

Reputation: 333

Creating New Age group

I have some age(12:54) and related data for them (here year and ASFR). The year starts from 1933 to 1987. The structure of the data is something like ensuing:

year age Asfr
1933 12 .00004
1933 13 .00044
1933 14 .00177
1933 15 .00672
1933 16 .01875
1933 17 .03846
1933 18 .06586
1933 19 .08719
... ... ...
1933 49 .00037
1933 50 .00009
1933 51 .00003
1933 52 .00003
1933 53 .00003
1933 54 .00002

Now, I need codes by which I can turn this data into age groups with the following structure:

"15-19" , "20-24", "25-29", "30-34", "35-39" ,"40-44", "45-49"

in which I want 15-19 age group be the sum of 12, 13, 14, 15, 16, 17, 18, 19

20-24 age group be the sum of 20, 21, 22, 23, 24

Finally, the last age group be the sum of 45, 46, 47, 48, 49, 50, 51, 52, 53,54

I would really appreciate it if someone could help me. Thank you so much in advance.

Upvotes: 2

Views: 90

Answers (2)

gurezende
gurezende

Reputation: 206

Here's a possible solution:

# Import tidyverse or dplyr
library(tidyverse)

#create the age groups and group by Year and age_groups    
df %>% mutate(age_groups = cut(df$age,
                           breaks=c(12, 20, 25, 30, 35, 40, 45,55),
                           right= F) ) %>% 
  group_by(year, age_groups) %>% 
  summarise(asfr_total = sum(Asfr))

You should see something like this:

   year age_groups asfr_total
  <dbl> <fct>           <dbl>
1  1933 [12,20)          4.32
2  1933 [20,25)          2.33
3  1933 [25,30)          2.68
4  1933 [30,35)          2.89
5  1933 [35,40)          2.23
6  1933 [40,45)          2.85
7  1933 [45,55)          6.05

Upvotes: 1

Matt
Matt

Reputation: 7385

You can use case_when from dplyr:

library(dplyr)

df %>% 
  mutate(age_group = case_when(age %in% c(12:19) ~ "15-19",
                               age %in% c(20:24) ~ "20-24",
                               age %in% c(25:29) ~ "25-29",
                               age %in% c(30:34) ~ "30-34",
                               age %in% c(35:39) ~ "35-39",
                               age %in% c(40:44) ~ "40-44",
                               age %in% c(45:49) ~ "45-49",
                               age > 49 ~ "50+")) %>% 
  group_by(age_group, year) %>% 
  summarize(total_asfr = sum(Asfr),
            age_group_n = n()) %>% 
  ungroup()

This gives us:

# A tibble: 5 × 3
  age_group total_asfr age_group_n
  <chr>          <dbl>       <int>
1 15-19        0.0385            2
2 20-24        0.00044           1
3 30-34        0.00177           1
4 45-49        0.00672           1
5 50+          0.0188            1

Using sample data:

df <- structure(list(year = c(1933L, 1933L, 1933L, 1933L, 1933L, 1933L
), age = c(12L, 23L, 34L, 45L, 56L, 17L), Asfr = c(4e-05, 0.00044, 
                                                   0.00177, 0.00672, 0.01875, 0.03846)), 
row.names = c(NA, -6L), 
class = "data.frame")

Upvotes: 2

Related Questions