bdevil
bdevil

Reputation: 185

Calculating median for each column of grouped data

I have a dataframe that looks like this:

 genotype     DIV3     DIV4 ...
 WT           12.4     15.2
 WT           35.4     35.3
 HET          1.3      1.2
 HET          1.5      5.2

I want to be able to calculate the median of each column for each group, but I'm not sure the best way to do this in R. I would prefer if I didn't have to call the genotype, as this may not remain constant for other datasets.

Upvotes: 2

Views: 3793

Answers (5)

rnso
rnso

Reputation: 24623

data.table version is also good:

library(data.table)
ddt[,lapply(.SD, median),by=genotype]
   genotype DIV3  DIV4
1:       WT 23.9 25.25
2:      HET  1.4  3.20

Upvotes: 2

thelatemail
thelatemail

Reputation: 93938

I find it amazing that noone has suggested aggregate yet, seeing as it is the simple, base R function included for these sorts of tasks. E.g.:

aggregate(. ~ genotype, data=dat, FUN=median)

#  genotype DIV3  DIV4
#1      HET  1.4  3.20
#2       WT 23.9 25.25

Upvotes: 5

rsoren
rsoren

Reputation: 4216

In general, I think it's good practice to use dplyr solutions instead of plyr. It's supposed to be a big improvement in terms of speed and readability. See this link.

For example:

require(dplyr)
df %>%
  group_by(genotype) %>%
  summarize(
    DIV3_median = median(DIV3),
    DIV4_median = median(DIV4)
  )

Upvotes: 0

DatamineR
DatamineR

Reputation: 9628

Try this:

apply(df[,-1], 2, function(x) tapply(x, df[,1], mean))

Upvotes: 2

bdevil
bdevil

Reputation: 185

I found ddply to be the best for this.

 medians = ddply(a, .(genotype), numcolwise(median))

Upvotes: 2

Related Questions