Reputation: 13335
I have a data frame which is constructed like this
age share
...
19 0.02
20 0.01
21 0.03
22 0.04
...
I want to merge each age group into larger cohorts like <20, 20-24, 25-29, 30-34, >=35
(and sum the shares).
Of course this could be easily done by hand, but I hardly can believe there is no dedicated function for that. However, I am not able to find this function. Can you help me?
Upvotes: 1
Views: 86
Reputation: 11893
What you want to use is ?cut. For example:
> myData <- read.table(text="age share
+ 19 0.02
+ 20 0.01
+ 21 0.03
+ 22 0.04", header=TRUE)
>
> myData$ageRange <- cut(myData$age, breaks=c(0, 20, 24, 29, 34, 35, 100))
> myData
age share ageRange
1 19 0.02 (0,20]
2 20 0.01 (0,20]
3 21 0.03 (20,24]
4 22 0.04 (20,24]
Notice that you need to include breakpoints that are below the bottom number and above the top number in order for those intervals to form properly. Notice further that the breakpoint is exactly (e.g.) 20
, and not <=20, >=21
; that is, there cannot be a 'gap' between 20
and 21
such that 20.5
would be left out.
From there, if you want the share
s in rows categorized under the same ageRange
to be summed, you can create a new data frame:
> newData <- aggregate(share~ageRange, myData, sum)
> newData
ageRange share
1 (0,20] 0.03
2 (20,24] 0.07
Upvotes: 4