Reputation: 333
I have some age(12:54) and related data for them (here year and ASFR). The year starts from 1933 to 1987. The structure of the data is something like ensuing:
year | age | Asfr |
---|---|---|
1933 | 12 | .00004 |
1933 | 13 | .00044 |
1933 | 14 | .00177 |
1933 | 15 | .00672 |
1933 | 16 | .01875 |
1933 | 17 | .03846 |
1933 | 18 | .06586 |
1933 | 19 | .08719 |
... | ... | ... |
1933 | 49 | .00037 |
1933 | 50 | .00009 |
1933 | 51 | .00003 |
1933 | 52 | .00003 |
1933 | 53 | .00003 |
1933 | 54 | .00002 |
Now, I need codes by which I can turn this data into age groups with the following structure:
"15-19" , "20-24", "25-29", "30-34", "35-39" ,"40-44", "45-49"
in which I want 15-19 age group be the sum of 12, 13, 14, 15, 16, 17, 18, 19
20-24 age group be the sum of 20, 21, 22, 23, 24
Finally, the last age group be the sum of 45, 46, 47, 48, 49, 50, 51, 52, 53,54
I would really appreciate it if someone could help me. Thank you so much in advance.
Upvotes: 2
Views: 90
Reputation: 206
Here's a possible solution:
# Import tidyverse or dplyr
library(tidyverse)
#create the age groups and group by Year and age_groups
df %>% mutate(age_groups = cut(df$age,
breaks=c(12, 20, 25, 30, 35, 40, 45,55),
right= F) ) %>%
group_by(year, age_groups) %>%
summarise(asfr_total = sum(Asfr))
You should see something like this:
year age_groups asfr_total
<dbl> <fct> <dbl>
1 1933 [12,20) 4.32
2 1933 [20,25) 2.33
3 1933 [25,30) 2.68
4 1933 [30,35) 2.89
5 1933 [35,40) 2.23
6 1933 [40,45) 2.85
7 1933 [45,55) 6.05
Upvotes: 1
Reputation: 7385
You can use case_when
from dplyr
:
library(dplyr)
df %>%
mutate(age_group = case_when(age %in% c(12:19) ~ "15-19",
age %in% c(20:24) ~ "20-24",
age %in% c(25:29) ~ "25-29",
age %in% c(30:34) ~ "30-34",
age %in% c(35:39) ~ "35-39",
age %in% c(40:44) ~ "40-44",
age %in% c(45:49) ~ "45-49",
age > 49 ~ "50+")) %>%
group_by(age_group, year) %>%
summarize(total_asfr = sum(Asfr),
age_group_n = n()) %>%
ungroup()
This gives us:
# A tibble: 5 × 3
age_group total_asfr age_group_n
<chr> <dbl> <int>
1 15-19 0.0385 2
2 20-24 0.00044 1
3 30-34 0.00177 1
4 45-49 0.00672 1
5 50+ 0.0188 1
Using sample data:
df <- structure(list(year = c(1933L, 1933L, 1933L, 1933L, 1933L, 1933L
), age = c(12L, 23L, 34L, 45L, 56L, 17L), Asfr = c(4e-05, 0.00044,
0.00177, 0.00672, 0.01875, 0.03846)),
row.names = c(NA, -6L),
class = "data.frame")
Upvotes: 2