Reputation: 185
I have a dataframe that looks like this:
genotype DIV3 DIV4 ...
WT 12.4 15.2
WT 35.4 35.3
HET 1.3 1.2
HET 1.5 5.2
I want to be able to calculate the median of each column for each group, but I'm not sure the best way to do this in R. I would prefer if I didn't have to call the genotype, as this may not remain constant for other datasets.
Upvotes: 2
Views: 3793
Reputation: 24623
data.table version is also good:
library(data.table)
ddt[,lapply(.SD, median),by=genotype]
genotype DIV3 DIV4
1: WT 23.9 25.25
2: HET 1.4 3.20
Upvotes: 2
Reputation: 93938
I find it amazing that noone has suggested aggregate
yet, seeing as it is the simple, base R function included for these sorts of tasks. E.g.:
aggregate(. ~ genotype, data=dat, FUN=median)
# genotype DIV3 DIV4
#1 HET 1.4 3.20
#2 WT 23.9 25.25
Upvotes: 5
Reputation: 4216
In general, I think it's good practice to use dplyr
solutions instead of plyr
. It's supposed to be a big improvement in terms of speed and readability. See this link.
For example:
require(dplyr)
df %>%
group_by(genotype) %>%
summarize(
DIV3_median = median(DIV3),
DIV4_median = median(DIV4)
)
Upvotes: 0
Reputation: 9628
Try this:
apply(df[,-1], 2, function(x) tapply(x, df[,1], mean))
Upvotes: 2
Reputation: 185
I found ddply to be the best for this.
medians = ddply(a, .(genotype), numcolwise(median))
Upvotes: 2