Reputation: 663
I have a dataframe that looks like this:
dat <- structure(list(cohort = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "ADC8_AA", class = "factor"),
status = c(1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, -9L, 1L, 1L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 2L, 1L, 2L, -9L, 2L, 1L, -9L, 2L), age_onset = c(NA,
NA, NA, NA, 63, NA, 79, NA, 67, 71, 81, NA, NA, NA, NA, 73,
NA, 66, 77, 68, 75, NA, NA, NA, NA, 76, 79, NA, NA, NA, NA,
NA, 70, NA, 77, 84, 78, 76, NA, 92, 64, 60, 72, NA, 81, NA,
62, NA, 82, 74)), row.names = c(NA, 50L), class = "data.frame")
I am trying to get mean and sd like this, but it gets me NA for SD for status ==-9
. What could be the reason and how do I do this correctly?
> aggregate(age_onset~cohort+status, data = dat, mean, na.action = na.omit)
cohort status age_onset
1 ADC8_AA -9 82.00000
2 ADC8_AA 2 73.54167
> aggregate(age_onset~cohort+status, data = dat, sd)
cohort status age_onset
1 ADC8_AA -9 NA
2 ADC8_AA 2 7.661191
Upvotes: 1
Views: 506
Reputation: 887541
We can use dplyr
library(dplyr)
dat %>%
group_by(cohort, status) %>%
summarise(Mean = mean(age_onset, na.rm = TRUE),
SD = sd(age_onset, na.rm = TRUE))
Upvotes: 0
Reputation: 146010
Try this:
aggregate(age_onset~cohort+status, data = dat, sd, na.rm = TRUE)
# cohort status age_onset
# 1 ADC8_AA -9 NA
# 2 ADC8_AA 2 7.661191
You can use the ...
argument of aggregate
to pass na.rm = TRUE
through to sd
.
You will still get NA
for any groups that only have a single non-missing value. This is because standard deviation isn't defined for a single value.
subset(dat, status == -9)
# cohort status age_onset
# 23 ADC8_AA -9 NA
# 46 ADC8_AA -9 NA
# 49 ADC8_AA -9 82
sd(82)
# [1] NA
Upvotes: 3