Reputation: 177
Good Morning, I got a lot of data and i have to calculate with it. There are 25 columns (variables) and each column contains thousands of values. But also missing values. I calculated the mean with
colMeans(df, na.rm = TRUE)
How can i calculate the sd of each column and ignore the NA-values?
Upvotes: 3
Views: 37715
Reputation: 11
sd(variablenname,na.rm=TRUE)
This works for me. Replace "variablename" with the variable you use.
Upvotes: 0
Reputation: 1
As the functioin summarise_each()
has been deprecated, here is an up-to-date example using dplyr
:
df1 %>% summarise_all(funs(sd(., na.rm = FALSE)))
Upvotes: 0
Reputation: 51592
You can try,
apply(df, 2, sd, na.rm = TRUE)
As the output of apply
is a matrix, and you will most likely have to transpose it, a more direct and safer option is to use lapply
or sapply
as noted by @docendodiscimus,
sapply(df, sd, na.rm = TRUE)
Upvotes: 12
Reputation: 887541
If we convert to matrix
, colSds
from matrixStats
can be used
library(matrixStats)
colSds(as.matrix(df), na.rm=TRUE)
Or we can use summarise_each
from dplyr
library(dplyr)
df1 %>%
summarise_each(funs(sd(., na.rm=TRUE)))
Upvotes: 3