Reputation: 53
I have a dataframe like this:
df
name var1 var2 var3 var4 var5 ...
site1 10 20 12 5 ..
site2 15 NA 11 2 ..
site3 NA 11 21 1 ..
site4 9 18 NA 6 ..
I use this code to calculate the median of the columns.
apply(df[,c(2:4)], 2, median)
But it gives NA for columns 2 to 4, because they have NA values. How to exclude the NA values and still calculate medians from the rest data in each column? If use na.rm=T for a subset, all rows with NAs will be removed, which is not what I want. Thanks for helping.
Upvotes: 2
Views: 23292
Reputation: 1
This should work:
for (i in 2:4) {
print(median(df[,i],na.rm=T))
}
Or with column names:
for (i in 2:4) {
print(paste("Median",colnames(df)[i],"=",median(df[,i],na.rm=T)))
}
Upvotes: -1
Reputation: 1272
This works:
df<-data.frame("a"=c(1,2,3, 4), "b"=c(1,NA,4, 5))
medianWithoutNA<-function(x) {
median(x[which(!is.na(x))])
}
apply(df, 2, medianWithoutNA)
a b
2.5 4.0
btw you can write
apply(df[,2:4], 2, median)
without c()
Please tell me if the solution works for you and if yes, accept my answer.
This is the code if you want to compute the median per site:
df<-data.frame(name=c("site1", "site1", "site2", "site2", "site3"), a=c(1, 2, 3, 1, 3), b=c(3, 2, 3, 1,4))
aggregate(cbind(a, b) ~ name, data=df, medianWithoutNA)
Upvotes: 3
Reputation: 115392
Use lapply
, which does not convert to a matrix.
lapply(df[2:4], median, na.rm = TRUE)
Upvotes: 7