speendo
speendo

Reputation: 13335

merge rows into groups

I have a data frame which is constructed like this

age  share
...
 19   0.02
 20   0.01
 21   0.03
 22   0.04
...

I want to merge each age group into larger cohorts like <20, 20-24, 25-29, 30-34, >=35 (and sum the shares).

Of course this could be easily done by hand, but I hardly can believe there is no dedicated function for that. However, I am not able to find this function. Can you help me?

Upvotes: 1

Views: 86

Answers (1)

gung - Reinstate Monica
gung - Reinstate Monica

Reputation: 11893

What you want to use is ?cut. For example:

> myData <- read.table(text="age  share
+  19   0.02
+  20   0.01
+  21   0.03
+  22   0.04", header=TRUE)
> 
> myData$ageRange <- cut(myData$age, breaks=c(0, 20, 24, 29, 34, 35, 100))
> myData
  age share ageRange
1  19  0.02   (0,20]
2  20  0.01   (0,20]
3  21  0.03  (20,24]
4  22  0.04  (20,24]

Notice that you need to include breakpoints that are below the bottom number and above the top number in order for those intervals to form properly. Notice further that the breakpoint is exactly (e.g.) 20, and not <=20, >=21; that is, there cannot be a 'gap' between 20 and 21 such that 20.5 would be left out.

From there, if you want the shares in rows categorized under the same ageRange to be summed, you can create a new data frame:

> newData <- aggregate(share~ageRange, myData, sum)
> newData
  ageRange share
1   (0,20]  0.03
2  (20,24]  0.07

Upvotes: 4

Related Questions