gymbrane
gymbrane

Reputation: 167

Summary stats a variable for each unique variable within a condition

I have a longitudinal spreadsheet that contains different growth variables for many individuals. At the moment my R code looks like this:

D5<-ifelse(growth$agyr == 5, growth$R.2ND.DIG.AVERAGE,NA)

Since it is longitudinal, I have the same measurement for each individual at multiples ages, thus the variable agyr. In this example it is taking all kids who have a finger measurement at age 5.

What I would like to do is do that for all ages so that I don't have to define an object every time, so I can essentially run some summary stats on finger length for any given agyr. Surely this is possible, but I am still a beginner at R.

Upvotes: 1

Views: 252

Answers (1)

Joris Meys
Joris Meys

Reputation: 108553

tapply() is your friend here. For the mean for example:

with(growth,
     tapply(R.2ND.DIG.AVERAGE,agyr,mean)
)

See also ?tapply and some good introduction book on R. And also ?with, a function that can really make your code a lot more readible.

If you have multiple levels you want to average over, you can give tapply() a list of factors. Say gender is a variable as well (a factor!), you can do eg:

with(growth,
     tapply(R.2ND.DIG.AVERAGE,list(agyr,gender),mean)
)

tapply() returns an array-like structure (a vector, matrix or multidimensional array, depending on the number of categorizing factors). If you want your results in a data frame and/or summarize multiple variables at once, look at ?aggregate, eg:

thevars <- c("R.2ND.DIG.AVERAGE","VAR2","MOREVAR")
aggregate(growth[thevars],by=list(agyr,gender), FUN="mean")

or using the formula interface:

aggregate(cbind(R.2ND.DIG.AVERAGE,VAR2,MOREVAR) ~ agyr + gender, 
         data=growth, FUN = "mean")

Make sure you check the help files as well. Both tapply() and aggregate() are quite powerful and have plenty other possibilities.

Upvotes: 1

Related Questions