mehmo
mehmo

Reputation: 485

changing the order of age-group into normal order

I have a data frame named df. in first step I have changed age into age-group and then got sum of each row based on agegroup and gender.

df<- data_frame(age= c(0,1,3,5,6,29,43,12,1,3,5,12,29,43,0,6), pop= c(12,11,33,45,56,54,67,76,65,11,78,90,112,29,70,60),gender=c(2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1))

changing age into age-group :

x <- df$age %/% 5
x <- pmax(0, pmin(20, x))
df$agegroup<- c(paste(0:19*5, 1:20*5-1, sep="-"), "+100")[x+1]

sum of each row:

df1 <- aggregate(formula = pop ~ gender  + agegroup, data = df, FUN = sum)

     gender agegroup   pop
1       1      0-4     146
2       2      0-4     56
3       1    10-14     90
4       2    10-14     76
5       1    25-29    112
6       2    25-29     54
7       1    40-44     29
8       2    40-44     67
9       1      5-9    138
10      2      5-9    101 

as shown in df1, the age-group 5-9 is located after 40-44 but I want to have ordered age-group. my desired output would be like this :

      gender  agegroup pop
1       1      0-4     146
2       2      0-4      56
3       1      5-9     138
4       2      5-9     101
5       1     10-14     90
6       2     10-14     76
7       1     25-29    112
8       2     25-29     54
9       1     40-44     29
10      2     40-44     67

Upvotes: 1

Views: 481

Answers (3)

akrun
akrun

Reputation: 887221

We can use mixedorder from gtools

df1[gtools::mixedorder(df1$agegroup),]
   gender agegroup pop
1       1      0-4 146
2       2      0-4  56
9       1      5-9 138
10      2      5-9 101
3       1    10-14  90
4       2    10-14  76
5       1    25-29 112
6       2    25-29  54
7       1    40-44  29
8       2    40-44  67

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389045

I am kind of reinventing the wheel here for something that you have already solved but you can use cut and pass breaks and labels to it.

The benefit of using cut is that it will give you factor levels which are already in the order that you want, you just need to arrange them.

library(dplyr)

x1 <- c(0, seq(4, 100, 5))
labels <- c(paste(x1[-length(x1)] + 1, x1[-1], sep = '-'), '100+')
labels[1] <- '0-4'

df %>%
  group_by(gender, agegroup = cut(age, c(x1, Inf), labels, include.lowest = TRUE)) %>%
  summarise(pop = sum(pop)) %>%
  ungroup %>%
  arrange(agegroup)

#   gender agegroup   pop
#    <dbl> <fct>    <dbl>
# 1      1 0-4        146
# 2      2 0-4         56
# 3      1 5-9        138
# 4      2 5-9        101
# 5      1 10-14       90
# 6      2 10-14       76
# 7      1 25-29      112
# 8      2 25-29       54
# 9      1 40-44       29
#10      2 40-44       67

Upvotes: 1

MrFlick
MrFlick

Reputation: 206253

You're going to want to set agegroup to a factor and specify the factor order. One way to do this is with reorder(). For example

df$agegroup <- reorder(df$agegroup, 
   as.numeric(gsub("-\\d+","", df$agegroup)))

We use gsub() to take off the second number, and then we can use that to sort by the numeric value of the first number.

Once you've updated the level order to be what you want, you should get the results in the order you want.

levels(df$agegroup)
# [1] "0-4"   "5-9"   "10-14" "25-29" "40-44"

Upvotes: 2

Related Questions