Reputation: 41

Column Mean by Factors

I would like to create a table of column means by Strain factors

I have the following data:

    Age Strain            103             3           163            39
V2   28  101CD  -3.4224173012 -0.3360570164 -9.2417448649 -3.6094766494
V3   28  101CD  -3.6487198656 -0.7948262475 -4.6350611123 -1.9232938265
V4   28  101CD  -7.0936427264 -0.1981243536 -9.2063428591  -3.367139071
V5   28  101CD  -5.9245254437 -0.1161875584 -7.3830396092 -4.7980771085
V6   30 101HFD  -9.4618204696 -5.0355557149 -3.9915005349 -0.9271933496
V7   30 101HFD   -8.805867863  -2.667103793 -2.2489197384 -1.5169130813
V8   30 101HFD -10.9841335945 -2.9617657815 -3.3460597574  -1.121806194
V9   30 101HFD -10.4612747952 -4.3759351258 -4.4322637085  -0.772499965
V10  30 101HFD  -9.2871507889 -1.2664335711 -4.3142098012 -1.3791233817
V11  30 101HFD -10.9443983294 -2.4651954898 -4.7759052834 -1.0954401254
V12  29  103CD  -2.7492530803 -2.0659306194 -2.5698186069 -1.4978280502
V13  29  103CD  -6.4401905692 -2.1098420514 -3.4349220483 -0.8836564768
V14  29  103CD   -6.479929929 -2.4792621691  -3.368774934 -0.7756932376
V15  29  103CD  -3.6586850957 -1.9145944032 -3.0911223702 -1.2730896376
V16  29  103CD  -7.1377230731  -1.413139617 -2.9203340711 -1.3152010161
V17  29 103HFD  -9.4624093184 -1.3265834556 -4.1871313168 -1.0108235293
V18  29 103HFD   -7.336764023 -0.8712499419  -4.204313727 -1.4450582002
V19  29 103HFD   -7.036723106 -0.7546877382 -6.0432957599 -1.4161366956
V20  29 103HFD  -9.4449207581 -0.9226067311 -4.6305567775  -1.320094489
V21  29 103HFD  -9.6383454033 -1.9620356763 -3.0214290407 -0.8602682738

And, I want to end up with this:

    Age Strain            103             3           163            39
V1  28   101CD  -3.4224173012 -0.3360570164 -9.2417448649 -3.6094766494
V2  30  101HFD  -9.4618204696 -5.0355557149 -3.9915005349 -0.9271933496
V3  29   103CD  -2.7492530803 -2.0659306194 -2.5698186069 -1.4978280502
V4  29  103HFD  -9.4624093184 -1.3265834556 -4.1871313168 -1.0108235293

Where [1,] is the mean of all columns for all samples with Strain=101CD, [2:3] is the mean of all columns for samples with Strain=101HFD, etc.

I have attempted to use:

> ave <- aggregate(data, as.list(factor(data$Age)), mean)
Error in aggregate.data.frame(data, as.list(factor(data$Age)), mean) : arguments must have same length

and

> ave <- sapply(split(data, data$Strain), mean)
 101CD 101HFD  103CD 103HFD   32CD   40CD  40HFD   43CD  43HFD   44CD  44HFD
    NA     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
...
 97HFD   98CD  98HFD   99CD  99HFD
    NA     NA     NA     NA     NA
There were 50 or more warnings (use warnings() to see the first 50)

and

> ave <- daply(data, data$Strain, mean)
Error in parse(text = x) : <text>:1:4: unexpected symbol
1: 101CD

I feel like there should be a fairly straightforward way to accomplish this, but I have been unable to find a solution.

Upvotes: 4

Answers (2)

jeremycg

Reputation: 24945

You can use dplyr. Here we group_by Strain, then use summarise_each to summarise each column, with the function mean with na.rm set to TRUE:

library(dplyr)

data %>% group_by(Strain) %>%
         summarise_each(funs(mean(., na.rm=TRUE)))

Source: local data frame [4 x 6]

  Strain   Age      X103         X3      X163       X39
  (fctr) (dbl)     (dbl)      (dbl)     (dbl)     (dbl)
1  101CD    28 -5.022326 -0.3612988 -7.616547 -3.424497
2 101HFD    30 -9.990774 -3.1286649 -3.851476 -1.135496
3  103CD    29 -5.293156 -1.9965538 -3.076994 -1.149094
4 103HFD    29 -8.583833 -1.1674327 -4.417345 -1.210476

Upvotes: 1

Thierry

Reputation: 18487

Exploit the fact that a data.frame is a special kind of list.

aggregate(data, data[, "Age", drop = FALSE], mean)

drop = FALSE is required so that the result of the selection remains a data.frame. data[, "Age"] is equivalent to data[, "Age", drop = TRUE] and will return a vector.

Upvotes: 0

Column Mean by Factors

Answers (2)

Related Questions