Reputation: 971
Quick question. I read my csv file into the variable data
. It has a column label var
, which has numerical values.
When I run the command
sd(data$var)
I get
[1] NA
instead of my standard deviation.
Could you please help me figure out what I am doing wrong?
Upvotes: 19
Views: 72443
Reputation: 1979
I've made the mistake a time or two of reusing variable names in dplyr strings which has caused issues.
mtcars %>%
group_by(gear) %>%
mutate(ave = mean(hp)) %>%
ungroup() %>%
group_by(cyl) %>%
summarise(med = median(ave),
ave = mean(ave), # should've named this variable something different
sd = sd(ave)) # this is the sd of my newly created variable "ave", not the original one.
Upvotes: 14
Reputation: 365
There may be Inf
or -Inf
as values in the data.
Try
is.finite(data)
or
min(data, na.rm = TRUE)
max(data, na.rm = TRUE)
to check if that is indeed the case.
Upvotes: 0
Reputation: 9380
Try sd(data$var, na.rm=TRUE)
and then any NAs in the column var will be ignored. Will also pay to check out your data to make sure the NA's should be NA's and there haven't been read in errors, commands like head(data)
, tail(data)
, and str(data)
should help with that.
Upvotes: 35
Reputation: 29525
You probably have missing values in var
, or the column is not numeric, or there's only one row.
Try removing missing values which will help for the first case:
sd(dat$var, na.rm = TRUE)
If that doesn't work, check that
class(dat$var)
is "numeric" (the second case) and that
nrow(dat)
is greater than 1 (the third case).
Finally, data
is a function in R so best to use a different name, which I've done here.
Upvotes: 6