R - Mean of specific sections of a vector

Question

I have the following code

mean(  myList$scores[ myList$IDs == "1234" ] )

This gives me the mean of the scores for the person with ID 1234.

Suppose I have a list of ID numbers, that is a SMALL sample of all ID numbers.

testIDs = c(1234,2345,3456,4567)

How do I change this to return to me 4 means, a list that is the mean of 1234, the mean of 2345, the mean of 3456, and the mean of 4567.

I know I could loop over the testIDs, but that's not the best way to go about this.

Raad · Accepted Answer

How about the following approaches (many others exist):

dta <- data.frame(id = rep(letters[1:4], each = 4), x = rnorm(16))

aggregate(dta$x, list(dta$id), mean)
lapply(split(dta$x, dta$id), mean)
tapply(dta$x, dta$id, mean)
by(dta$x, dta$id, mean)

Some timings:

Unit: microseconds
                              expr      mean
aggregate(dta$x, list(dta$id), mean) 892.08428
lapply(split(dta$x, dta$id), mean)   61.05315
tapply(dta$x, dta$id, mean)          172.62361
by(dta$x, dta$id, mean)              421.29666

Here is an edit to reflect only getting a subset of the ids:

dta <- data.frame(id = rep(letters[1:10], each = 4), x = rnorm(40))

indx <- dta$id %in% letters[1:4]
lapply(split(dta[indx, 2], dta[indx, 1], drop = TRUE), mean)

Alternatively, the answer in the comments does the trick as well:

sapply(letters[1:4], function(s) mean(dta$x[ dta$id == s ]))

R - Mean of specific sections of a vector

Answers (2)

Related Questions