Reputation: 41
I would like to create a table of column means by Strain factors
I have the following data:
Age Strain 103 3 163 39
V2 28 101CD -3.4224173012 -0.3360570164 -9.2417448649 -3.6094766494
V3 28 101CD -3.6487198656 -0.7948262475 -4.6350611123 -1.9232938265
V4 28 101CD -7.0936427264 -0.1981243536 -9.2063428591 -3.367139071
V5 28 101CD -5.9245254437 -0.1161875584 -7.3830396092 -4.7980771085
V6 30 101HFD -9.4618204696 -5.0355557149 -3.9915005349 -0.9271933496
V7 30 101HFD -8.805867863 -2.667103793 -2.2489197384 -1.5169130813
V8 30 101HFD -10.9841335945 -2.9617657815 -3.3460597574 -1.121806194
V9 30 101HFD -10.4612747952 -4.3759351258 -4.4322637085 -0.772499965
V10 30 101HFD -9.2871507889 -1.2664335711 -4.3142098012 -1.3791233817
V11 30 101HFD -10.9443983294 -2.4651954898 -4.7759052834 -1.0954401254
V12 29 103CD -2.7492530803 -2.0659306194 -2.5698186069 -1.4978280502
V13 29 103CD -6.4401905692 -2.1098420514 -3.4349220483 -0.8836564768
V14 29 103CD -6.479929929 -2.4792621691 -3.368774934 -0.7756932376
V15 29 103CD -3.6586850957 -1.9145944032 -3.0911223702 -1.2730896376
V16 29 103CD -7.1377230731 -1.413139617 -2.9203340711 -1.3152010161
V17 29 103HFD -9.4624093184 -1.3265834556 -4.1871313168 -1.0108235293
V18 29 103HFD -7.336764023 -0.8712499419 -4.204313727 -1.4450582002
V19 29 103HFD -7.036723106 -0.7546877382 -6.0432957599 -1.4161366956
V20 29 103HFD -9.4449207581 -0.9226067311 -4.6305567775 -1.320094489
V21 29 103HFD -9.6383454033 -1.9620356763 -3.0214290407 -0.8602682738
And, I want to end up with this:
Age Strain 103 3 163 39
V1 28 101CD -3.4224173012 -0.3360570164 -9.2417448649 -3.6094766494
V2 30 101HFD -9.4618204696 -5.0355557149 -3.9915005349 -0.9271933496
V3 29 103CD -2.7492530803 -2.0659306194 -2.5698186069 -1.4978280502
V4 29 103HFD -9.4624093184 -1.3265834556 -4.1871313168 -1.0108235293
Where [1,] is the mean of all columns for all samples with Strain=101CD, [2:3] is the mean of all columns for samples with Strain=101HFD, etc.
I have attempted to use:
> ave <- aggregate(data, as.list(factor(data$Age)), mean)
Error in aggregate.data.frame(data, as.list(factor(data$Age)), mean) : arguments must have same length
and
> ave <- sapply(split(data, data$Strain), mean)
101CD 101HFD 103CD 103HFD 32CD 40CD 40HFD 43CD 43HFD 44CD 44HFD
NA NA NA NA NA NA NA NA NA NA NA
...
97HFD 98CD 98HFD 99CD 99HFD
NA NA NA NA NA
There were 50 or more warnings (use warnings() to see the first 50)
and
> ave <- daply(data, data$Strain, mean)
Error in parse(text = x) : <text>:1:4: unexpected symbol
1: 101CD
I feel like there should be a fairly straightforward way to accomplish this, but I have been unable to find a solution.
Upvotes: 4
Views: 105
Reputation: 24945
You can use dplyr
. Here we group_by
Strain, then use summarise_each
to summarise each column, with the function mean
with na.rm
set to TRUE
:
library(dplyr)
data %>% group_by(Strain) %>%
summarise_each(funs(mean(., na.rm=TRUE)))
Source: local data frame [4 x 6]
Strain Age X103 X3 X163 X39
(fctr) (dbl) (dbl) (dbl) (dbl) (dbl)
1 101CD 28 -5.022326 -0.3612988 -7.616547 -3.424497
2 101HFD 30 -9.990774 -3.1286649 -3.851476 -1.135496
3 103CD 29 -5.293156 -1.9965538 -3.076994 -1.149094
4 103HFD 29 -8.583833 -1.1674327 -4.417345 -1.210476
Upvotes: 1
Reputation: 18487
Exploit the fact that a data.frame is a special kind of list.
aggregate(data, data[, "Age", drop = FALSE], mean)
drop = FALSE
is required so that the result of the selection remains a data.frame. data[, "Age"]
is equivalent to data[, "Age", drop = TRUE]
and will return a vector.
Upvotes: 0