Reputation: 240
I'm trying to write a function that bins ages into different groups.
Suppose my data is the following:
birthyear
1987 1995 1994 1981 1994 1989 1985 1987 1996 1981 1980 1994 1996 1983 1949 1988
1998 1977 1967 1968
And my function is written to convert the birth year into ages and then bin them into 1 of 10 different categories based on a data frame called agebreaks:
>agebreaks
Category Birth.min Birth.max
1 14 to 19 years 2000 1995
2 20 to 24 years 1994 1990
3 25 to 34 years 1989 1980
4 35 to 44 years 1979 1970
5 45 to 54 years 1969 1960
6 55 to 59 years 1959 1955
7 60 to 64 years 1954 1950
8 65 to 74 years 1949 1940
9 75 to 84 years 1939 1930
10 85 years and over 1959 1864
Function:
bin.age <- function(burthyear,agebreak,2014){
p.ages <- yyyy-df$Age
ab <- as.data.frame(agebreak)
min.ab <- yyyy-ab$Birth.min
max.ab <- yyyy-ab$Birth.max
avec <- sort(c(min.ab[1],max.ab[1],min.ab[2],max.ab[2],min.ab[3],max.ab[3],min.ab[4],max.ab[4],min.ab[5],max.ab[5],min.ab[6],max.ab[6],min.ab[7],max.ab[7],min.ab[8],max.ab[8],min.ab[9],max.ab[9],min.ab[10],max.ab[10]))
tmp <- findInterval(p.ages, avec)
tt <- table(tmp)
names(tt)<-c("14 to 19 years","20 to 24 years","25 to 34 years","35 to 44 years","45 to 54 years","55 to 59 years","60 to 64 years","65 to 74 years","75 to 84 years","85 years and over")
return(tt)
}
What I want is all the 14 to 19 year olds grouped, 20 to 24 year olds grouped, and so on. What I get instead of the desired 10 groups are 20 18 groups. I've tried using cut() as well to no avail. Any suggestions?
Upvotes: 0
Views: 2914
Reputation: 206232
cut()
is probably the correct function here. The thing is you just need to specify the break points of the ranges, not the beginning and ending intervals. The measure is assumed to be continuous.
#input data
birthyear <- c(1987, 1995, 1994, 1981, 1994, 1989, 1985, 1987, 1996, 1981,
1980, 1994, 1996, 1983, 1949, 1988, 1998, 1977, 1967, 1968)
agebreaks <- c(1864, 1929, 1939,1949,1954,1959,1969,1979,1989,1994,2000)
#cut
a < -cut(birthyear, agebreaks, include.lowest=T)
#rename
levels(a) <- rev(c("14 to 19 years","20 to 24 years","25 to 34 years",
"35 to 44 years","45 to 54 years","55 to 59 years","60 to 64 years",
"65 to 74 years","75 to 84 years","85 years and over"))
#table
as.data.frame(table(a))
#result
a Freq
1 85 years and over 0
2 75 to 84 years 0
3 65 to 74 years 1
4 60 to 64 years 0
5 55 to 59 years 0
6 45 to 54 years 2
7 35 to 44 years 1
8 25 to 34 years 9
9 20 to 24 years 3
10 14 to 19 years 4
Upvotes: 2