Reputation: 1814
My data.frame:
df
ID Time a b c d e
WT A 28 56 50 60 15
WT B 54 77 11 67 34
WT C 53 8 87 62 55
WT D 30 73 47 82 1
KO A 24 83 14 17 36
KO B 91 83 72 41 4
KO C 79 17 76 21 54
KO D 41 40 77 49 92
I subset and average groups df:
use_col=3:ncol(df)
mymean<-aggregate(df[, use_col],by= list(df$ID, df$Time),FUN = function(X)mean(X,na.rm=T))
Group.1 Group.2 a b c d e
WT A 51 52 49 29 47
KO A 8 17 78 64 96
WT B 79 5 45 83 56
KO B 53 47 37 99 17
WT C 72 38 56 63 40
KO C 51 3 30 47 52
WT D 3 30 75 53 73
KO D 13 72 16 52 13
Why do I loose the first two columns names and how could I keep them? Also, I try to factor mymean data.frame by:
mymean$Group.1=factor(mymean$Group.1, c("WT","KO"))
but it does not work. Thank you for the help.
Actually the mean should not have Time and should be like that:
mymean<-aggregate(df[, use_col],by= list(df$ID, ),FUN = function(X)mean(X,na.rm=T))
However the output is:
ID a b c d e
KO 75.75 44.25 61.75 52.50 39.0
WT 56.00 57.00 84.25 58.75 39.5
But it should be the other way round like this:
ID a b c d e
WT 56.00 57.00 84.25 58.75 39.5
KO 75.75 44.25 61.75 52.50 39.0
Upvotes: 0
Views: 92
Reputation: 887711
In aggregate
, another option is the formula method, where we can specify all the non-grouping columns with .
on the LHS of ~
and the grouping column on the RHS. As we don't need the 'Time' column in the mean
, we can use a subset
of the dataset, get the mean
, specify the additional arguments na.rm=TRUE
, na.action=NULL
(it is to avoid removing an entire row if there is an NA
element) and order
the output based on the 'ID' column.
res <- aggregate(.~ID, subset(df, select=-Time),
FUN=mean, na.rm=TRUE, na.action=NULL)
res[order(res$ID, decreasing=TRUE),]
# ID a b c d e
#2 WT 41.25 53.50 48.75 67.75 26.25
#1 KO 58.75 55.75 59.75 32.00 46.50
We could also use summarise_each
from dplyr
as another option, where we group by 'ID' column and get the mean
of all other columns except the the 'Time' and order the output using ID
.
library(dplyr)
df %>%
group_by(ID) %>%
summarise_each(funs(mean=mean(., na.rm=TRUE)), -Time) %>%
arrange(desc(ID))
# ID a b c d e
#1 WT 41.25 53.50 48.75 67.75 26.25
#2 KO 58.75 55.75 59.75 32.00 46.50
Upvotes: 1
Reputation: 99361
In your aggregate()
call, change the by
argument to have named list elements
by = list(ID = df$ID, Time = df$Time)
Then for your updated question, you could use
use_col = 3:ncol(df)
mymean <- aggregate(df[, use_col], by = list(ID = df$ID), mean, na.rm=TRUE)
mymean[order(mymean$ID, decreasing = TRUE), ]
# ID a b c d e
# 2 WT 41.25 53.50 48.75 67.75 26.25
# 1 KO 58.75 55.75 59.75 32.00 46.50
Presumably these values are different from yours because you are using a different data set.
You could also do this with less pain with data.table
library(data.table)
## convert to data table
dt <- as.data.table(df)
## order by decreasing ID
setorderv(dt, "ID", -1L)
## remove the Time column then find the mean of all columns by ID
dt[, lapply(.SD, mean, na.rm = TRUE), by = ID, .SDcols = use_col]
# ID a b c d e
# 1: WT 41.25 53.50 48.75 67.75 26.25
# 2: KO 58.75 55.75 59.75 32.00 46.50
Upvotes: 2