Reputation: 53

How to calculate the median values without NAs?

I have a dataframe like this:

df
name    var1  var2  var3  var4  var5 ...
site1    10    20    12    5     ..
site2    15    NA    11    2     ..
site3    NA    11    21    1     ..
site4    9     18    NA    6     ..

I use this code to calculate the median of the columns.

apply(df[,c(2:4)], 2, median)

But it gives NA for columns 2 to 4, because they have NA values. How to exclude the NA values and still calculate medians from the rest data in each column? If use na.rm=T for a subset, all rows with NAs will be removed, which is not what I want. Thanks for helping.

Upvotes: 2

Answers (3)

mcam

Reputation: 1

This should work:

for (i in 2:4) {
  print(median(df[,i],na.rm=T))
}

Or with column names:

for (i in 2:4) {
  print(paste("Median",colnames(df)[i],"=",median(df[,i],na.rm=T)))
}

Upvotes: -1

Verena Praher

Reputation: 1272

This works:

df<-data.frame("a"=c(1,2,3, 4), "b"=c(1,NA,4, 5))

medianWithoutNA<-function(x) {
   median(x[which(!is.na(x))])
}

apply(df, 2, medianWithoutNA)
  a   b 
2.5 4.0

btw you can write

apply(df[,2:4], 2, median)

without c()

Please tell me if the solution works for you and if yes, accept my answer.

This is the code if you want to compute the median per site:

df<-data.frame(name=c("site1", "site1", "site2", "site2", "site3"), a=c(1, 2, 3, 1, 3), b=c(3, 2, 3, 1,4))
aggregate(cbind(a, b) ~ name, data=df, medianWithoutNA)

Upvotes: 3

mnel

Reputation: 115392

Use lapply, which does not convert to a matrix.

lapply(df[2:4], median, na.rm = TRUE)

Upvotes: 7

How to calculate the median values without NAs?

Answers (3)

Related Questions