Column names and column order

Question

My data.frame:

df
ID Time  a  b  c  d  e
WT    A 28 56 50 60 15
WT    B 54 77 11 67 34
WT    C 53  8 87 62 55
WT    D 30 73 47 82  1
KO    A 24 83 14 17 36
KO    B 91 83 72 41  4
KO    C 79 17 76 21 54
KO    D 41 40 77 49 92

I subset and average groups df:

use_col=3:ncol(df)
mymean<-aggregate(df[, use_col],by= list(df$ID, df$Time),FUN = function(X)mean(X,na.rm=T))

Group.1 Group.2  a  b  c  d  e
  WT       A    51 52 49 29 47
  KO       A     8 17 78 64 96
  WT       B    79  5 45 83 56
  KO       B    53 47 37 99 17
  WT       C    72 38 56 63 40
  KO       C    51  3 30 47 52
  WT       D     3 30 75 53 73
  KO       D    13 72 16 52 13

Why do I loose the first two columns names and how could I keep them? Also, I try to factor mymean data.frame by:

mymean$Group.1=factor(mymean$Group.1, c("WT","KO"))

but it does not work. Thank you for the help.

Actually the mean should not have Time and should be like that:

mymean<-aggregate(df[, use_col],by= list(df$ID, ),FUN = function(X)mean(X,na.rm=T))

However the output is:

ID     a     b     c     d    e
KO 75.75 44.25 61.75 52.50 39.0
WT 56.00 57.00 84.25 58.75 39.5

But it should be the other way round like this:

ID     a     b     c     d    e
WT 56.00 57.00 84.25 58.75 39.5   
KO 75.75 44.25 61.75 52.50 39.0

Rich Scriven · Accepted Answer

In your aggregate() call, change the by argument to have named list elements

by = list(ID = df$ID, Time = df$Time)

Then for your updated question, you could use

use_col = 3:ncol(df)
mymean <- aggregate(df[, use_col], by = list(ID = df$ID), mean, na.rm=TRUE)
mymean[order(mymean$ID, decreasing = TRUE), ]
#   ID     a     b     c     d     e
# 2 WT 41.25 53.50 48.75 67.75 26.25
# 1 KO 58.75 55.75 59.75 32.00 46.50

Presumably these values are different from yours because you are using a different data set.

You could also do this with less pain with data.table

library(data.table)
## convert to data table
dt <- as.data.table(df)
## order by decreasing ID
setorderv(dt, "ID", -1L)
## remove the Time column then find the mean of all columns by ID
dt[, lapply(.SD, mean, na.rm = TRUE), by = ID, .SDcols = use_col]
#    ID     a     b     c     d     e
# 1: WT 41.25 53.50 48.75 67.75 26.25
# 2: KO 58.75 55.75 59.75 32.00 46.50

Column names and column order

Answers (2)

Related Questions