Reputation: 167
I have a longitudinal spreadsheet that contains different growth variables for many individuals. At the moment my R code looks like this:
D5<-ifelse(growth$agyr == 5, growth$R.2ND.DIG.AVERAGE,NA)
Since it is longitudinal, I have the same measurement for each individual at multiples ages, thus the variable agyr. In this example it is taking all kids who have a finger measurement at age 5.
What I would like to do is do that for all ages so that I don't have to define an object every time, so I can essentially run some summary stats on finger length for any given agyr. Surely this is possible, but I am still a beginner at R.
Upvotes: 1
Views: 252
Reputation: 108553
tapply()
is your friend here. For the mean for example:
with(growth,
tapply(R.2ND.DIG.AVERAGE,agyr,mean)
)
See also ?tapply
and some good introduction book on R. And also ?with
, a function that can really make your code a lot more readible.
If you have multiple levels you want to average over, you can give tapply()
a list of factors. Say gender
is a variable as well (a factor!), you can do eg:
with(growth,
tapply(R.2ND.DIG.AVERAGE,list(agyr,gender),mean)
)
tapply()
returns an array-like structure (a vector, matrix or multidimensional array, depending on the number of categorizing factors). If you want your results in a data frame and/or summarize multiple variables at once, look at ?aggregate
, eg:
thevars <- c("R.2ND.DIG.AVERAGE","VAR2","MOREVAR")
aggregate(growth[thevars],by=list(agyr,gender), FUN="mean")
or using the formula interface:
aggregate(cbind(R.2ND.DIG.AVERAGE,VAR2,MOREVAR) ~ agyr + gender,
data=growth, FUN = "mean")
Make sure you check the help files as well. Both tapply()
and aggregate()
are quite powerful and have plenty other possibilities.
Upvotes: 1