Sandra
Sandra

Reputation: 25

Calculate the mean per subject and repeat the value for each subject's row

This is the first time that I ask a question on stack overflow. I have tried searching for the answer but I cannot find exactly what I am looking for. I hope someone can help.

I have a huge data set of 20416 observation. Basically, I have 83 subjects and for each subject I have several observations. However, the number of observations per subject is not the same (e.g. subject 1 has 256 observations, while subject 2 has only 64 observations). I want to add an extra column containing the mean of the observations for each subject (the observations are reading times (RT)).

I tried with the aggregate function:

aggregate (RT ~ su, data, mean)

This formula returns the correct mean per subject. But then I cannot simply do the following:

data$mean <- aggregate (RT ~ su, data, mean)

as R returns this error:

Error in $<-.data.frame(tmp, "mean", value = list(su = 1:83, RT = c(378.1328125, : replacement has 83 rows, data has 20416

I understand that the formula lacks a command specifying that the mean for each subject has to be repeated for all the subject's rows (e.g. if subject 1 has 256 rows, the mean for subject 1 has to be repeated for 256 rows, if subject 2 has 64 rows, the mean for subject 2 has to be repeated for 64 rows and so forth).

How can I achieve this in R?

Upvotes: 2

Views: 5053

Answers (4)

Gregor Thomas
Gregor Thomas

Reputation: 145825

Staying in Base R, ave is intended for this use:

data$mean = with(data, ave(x = RT, su, FUN = mean))

Upvotes: 5

daniel
daniel

Reputation: 1246

Another compelling way of handling this without generating extra data objects is by using group_by of dplyr package:

# Generating some data
data <- data.table::data.table(
     su = sample(letters[1:5], size = 14, replace = TRUE),
     RT = rnorm(14))[order(su)]

# Performing
> data %>% group_by(su) %>% 
+     mutate(Mean = mean(RT)) %>%
+     ungroup() 
Source: local data table [14 x 3]

   su          RT       Mean
1   a -1.62841746  0.2096967
2   a  0.07286149  0.2096967
3   a  0.02429030  0.2096967
4   a  0.98882343  0.2096967
5   a  0.95407214  0.2096967
6   a  1.18823435  0.2096967
7   a -0.13198711  0.2096967
8   b -0.34897914  0.1469982
9   b  0.64297557  0.1469982
10  c -0.58995261 -0.5899526
11  d -0.95995198  0.3067978
12  d  1.57354754  0.3067978
13  e  0.43071258  0.2462978
14  e  0.06188307  0.2462978 

Upvotes: 1

nrussell
nrussell

Reputation: 18612

The data.table syntax lends itself well to this kind of problem:

Dt[, Mean := mean(Value), by = "ID"][]
#     ID       Value        Mean
# 1:   a  0.05881156 0.004426491
# 2:   a -0.04995858 0.004426491
# 3:   b  0.64054432 0.038809830
# 4:   b -0.56292466 0.038809830
# 5:   c  0.44254622 0.099747707
# 6:   c -0.10771992 0.099747707
# 7:   c -0.03558318 0.099747707
# 8:   d  0.56727423 0.532377247
# 9:   d -0.60962095 0.532377247
# 10:  d  1.13808538 0.532377247
# 11:  d  1.03377033 0.532377247
# 12:  e  1.38789640 0.568760936
# 13:  e -0.57420308 0.568760936
# 14:  e  0.89258949 0.568760936

As we are applying a grouped operation (by = "ID"), data.table will automatically replicate each group's mean(Value) the appropriate number of times (avoiding the error you ran into above).


Data:

Dt <- data.table::data.table(
  ID = sample(letters[1:5], size = 14, replace = TRUE),
  Value = rnorm(14))[order(ID)]

Upvotes: 6

Parfait
Parfait

Reputation: 107652

Simply merge your aggregated means data with full dataframe joined by the subject:

aggdf <- aggregate (RT ~ su, data, mean)

names(aggdf)[2] <- "MeanOfRT"
df <- merge(df, aggdf, by="su")

Upvotes: 1

Related Questions