S jos
S jos

Reputation: 33

Mean and standard deviation by groups

I have a data frame something like this:

obs1 obs2 obs3 obs4 obs5
4     6    7     3    0
7     2    4     5    0
2     5    7     8    1
5     8    6     9    1
6     0    3     6    1
7     1    2     4    1

I want to compute the mean and standard deviation for obs 1 to 4 conditioned on obs5 and put it in a table format. The columns headings should be means and standard deviation for each of whether the obs5 is "0" or "1". Thus in this case the table will be of 4 by 4 type.

I tried

table <- aggregate( .~ obs5, DF, function(x) c(mean = mean(x), sd = sd(x)))

and am not sure what to do further to get proper format.

Upvotes: 3

Views: 3269

Answers (3)

akrun
akrun

Reputation: 887881

We could use data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), reshape it from 'wide' to 'long' format and then reshape it back to 'wide' format with dcast. The dcast from data.table can take multiple fun.aggregate.

library(data.table)#v1.9.6+
DT <- melt(setDT(df1), id.var='obs5', variable.name='Obs')
dcast(DT, Obs~obs5, value.var='value', fun.aggregate=c(mean, sd))

#    Obs value_mean_0 value_mean_1 value_sd_0 value_sd_1
#1: obs1          5.5         5.00   2.121320   2.160247
#2: obs2          4.0         3.50   2.828427   3.696846
#3: obs3          5.5         4.50   2.121320   2.380476
#4: obs4          4.0         6.75   1.414214   2.217356

Upvotes: 3

stas g
stas g

Reputation: 1523

a bit long-winded but produces output in correct format:

DF <- data.frame(obs1 = c(4, 7, 2, 5, 6, 7), obs2 = c(6, 2, 5, 8, 0, 1), obs3 = c(7, 4, 7, 6, 3, 2), obs4 = c(3, 5, 8, 9, 6, 4), obs5 = c(0, 0, 1, 1, 1, 1))

res <- by(DF[, -5], DF$obs5, FUN = function(x) rbind(colMeans(x), sqrt(diag(var(x)))))
res <- do.call(rbind, res)
rownames(res) <- paste(rep(c('mean', 'sd'), 2), rep(c(0, 1), c(2, 2)), sep = ".")
t(res)

#     mean.0     sd.0 mean.1     sd.1
#obs1    5.5 2.121320   5.00 2.160247
#obs2    4.0 2.828427   3.50 3.696846
#obs3    5.5 2.121320   4.50 2.380476
#obs4    4.0 1.414214   6.75 2.217356

Upvotes: -1

Tim Keighley
Tim Keighley

Reputation: 323

You can calculate the means and standard deviations separately then combine the results together:

means <- aggregate( .~ obs5, DF, mean)
rownames(means) <- paste("mean", means$obs5)
sds <- aggregate( .~ obs5, DF, sd)
rownames(sds) <- paste("sd", means$obs5)

tab <- rbind(means, sds)
tab <- tab[, -1]
tab <- t(tab)

Result:

     mean 0 mean 1     sd 0     sd 1
obs1    5.5   5.00 2.121320 2.160247
obs2    4.0   3.50 2.828427 3.696846
obs3    5.5   4.50 2.121320 2.380476
obs4    4.0   6.75 1.414214 2.217356

Upvotes: 1

Related Questions