Archimeow
Archimeow

Reputation: 240

Binning ages in R

I'm trying to write a function that bins ages into different groups.

Suppose my data is the following:

birthyear

1987 1995 1994 1981 1994 1989 1985 1987 1996 1981 1980 1994 1996 1983 1949 1988
1998 1977 1967 1968

And my function is written to convert the birth year into ages and then bin them into 1 of 10 different categories based on a data frame called agebreaks:

>agebreaks
                Category Birth.min Birth.max
1       14 to 19 years      2000      1995
2       20 to 24 years      1994      1990
3       25 to 34 years      1989      1980
4       35 to 44 years      1979      1970
5       45 to 54 years      1969      1960
6       55 to 59 years      1959      1955
7       60 to 64 years      1954      1950
8       65 to 74 years      1949      1940
9       75 to 84 years      1939      1930
10   85 years and over      1959      1864

Function:

    bin.age <- function(burthyear,agebreak,2014){
    p.ages <- yyyy-df$Age
    ab     <- as.data.frame(agebreak)
    min.ab <- yyyy-ab$Birth.min
    max.ab <- yyyy-ab$Birth.max
    avec   <- sort(c(min.ab[1],max.ab[1],min.ab[2],max.ab[2],min.ab[3],max.ab[3],min.ab[4],max.ab[4],min.ab[5],max.ab[5],min.ab[6],max.ab[6],min.ab[7],max.ab[7],min.ab[8],max.ab[8],min.ab[9],max.ab[9],min.ab[10],max.ab[10]))


    tmp <- findInterval(p.ages, avec)
    tt  <- table(tmp)
    names(tt)<-c("14 to 19 years","20 to 24 years","25 to 34 years","35 to 44 years","45 to 54 years","55 to 59 years","60 to 64 years","65 to 74 years","75 to 84 years","85 years and over")
return(tt)
}

What I want is all the 14 to 19 year olds grouped, 20 to 24 year olds grouped, and so on. What I get instead of the desired 10 groups are 20 18 groups. I've tried using cut() as well to no avail. Any suggestions?

Upvotes: 0

Views: 2914

Answers (1)

MrFlick
MrFlick

Reputation: 206232

cut() is probably the correct function here. The thing is you just need to specify the break points of the ranges, not the beginning and ending intervals. The measure is assumed to be continuous.

#input data
birthyear <- c(1987, 1995, 1994, 1981, 1994, 1989, 1985, 1987, 1996, 1981, 
    1980, 1994, 1996, 1983, 1949, 1988, 1998, 1977, 1967, 1968)
agebreaks <- c(1864, 1929, 1939,1949,1954,1959,1969,1979,1989,1994,2000)

#cut
a < -cut(birthyear, agebreaks, include.lowest=T)
#rename
levels(a) <- rev(c("14 to 19 years","20 to 24 years","25 to 34 years",
    "35 to 44 years","45 to 54 years","55 to 59 years","60 to 64 years",
    "65 to 74 years","75 to 84 years","85 years and over"))

#table
as.data.frame(table(a))

#result
                   a Freq
1  85 years and over    0
2     75 to 84 years    0
3     65 to 74 years    1
4     60 to 64 years    0
5     55 to 59 years    0
6     45 to 54 years    2
7     35 to 44 years    1
8     25 to 34 years    9
9     20 to 24 years    3
10    14 to 19 years    4

Upvotes: 2

Related Questions