Reputation: 485
I have a data frame named df
. in first step I have changed age into age-group and then got sum of each row based on agegroup
and gender
.
df<- data_frame(age= c(0,1,3,5,6,29,43,12,1,3,5,12,29,43,0,6), pop= c(12,11,33,45,56,54,67,76,65,11,78,90,112,29,70,60),gender=c(2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1))
changing age into age-group :
x <- df$age %/% 5
x <- pmax(0, pmin(20, x))
df$agegroup<- c(paste(0:19*5, 1:20*5-1, sep="-"), "+100")[x+1]
sum of each row:
df1 <- aggregate(formula = pop ~ gender + agegroup, data = df, FUN = sum)
gender agegroup pop
1 1 0-4 146
2 2 0-4 56
3 1 10-14 90
4 2 10-14 76
5 1 25-29 112
6 2 25-29 54
7 1 40-44 29
8 2 40-44 67
9 1 5-9 138
10 2 5-9 101
as shown in df1, the age-group 5-9
is located after 40-44
but I want to have ordered age-group. my desired output would be like this :
gender agegroup pop
1 1 0-4 146
2 2 0-4 56
3 1 5-9 138
4 2 5-9 101
5 1 10-14 90
6 2 10-14 76
7 1 25-29 112
8 2 25-29 54
9 1 40-44 29
10 2 40-44 67
Upvotes: 1
Views: 481
Reputation: 887221
We can use mixedorder
from gtools
df1[gtools::mixedorder(df1$agegroup),]
gender agegroup pop
1 1 0-4 146
2 2 0-4 56
9 1 5-9 138
10 2 5-9 101
3 1 10-14 90
4 2 10-14 76
5 1 25-29 112
6 2 25-29 54
7 1 40-44 29
8 2 40-44 67
Upvotes: 1
Reputation: 389045
I am kind of reinventing the wheel here for something that you have already solved but you can use cut
and pass breaks and labels to it.
The benefit of using cut
is that it will give you factor levels which are already in the order that you want, you just need to arrange
them.
library(dplyr)
x1 <- c(0, seq(4, 100, 5))
labels <- c(paste(x1[-length(x1)] + 1, x1[-1], sep = '-'), '100+')
labels[1] <- '0-4'
df %>%
group_by(gender, agegroup = cut(age, c(x1, Inf), labels, include.lowest = TRUE)) %>%
summarise(pop = sum(pop)) %>%
ungroup %>%
arrange(agegroup)
# gender agegroup pop
# <dbl> <fct> <dbl>
# 1 1 0-4 146
# 2 2 0-4 56
# 3 1 5-9 138
# 4 2 5-9 101
# 5 1 10-14 90
# 6 2 10-14 76
# 7 1 25-29 112
# 8 2 25-29 54
# 9 1 40-44 29
#10 2 40-44 67
Upvotes: 1
Reputation: 206253
You're going to want to set agegroup
to a factor and specify the factor order. One way to do this is with reorder()
. For example
df$agegroup <- reorder(df$agegroup,
as.numeric(gsub("-\\d+","", df$agegroup)))
We use gsub()
to take off the second number, and then we can use that to sort by the numeric value of the first number.
Once you've updated the level order to be what you want, you should get the results in the order you want.
levels(df$agegroup)
# [1] "0-4" "5-9" "10-14" "25-29" "40-44"
Upvotes: 2