Reputation: 65
I am trying to calculate descriptive statistics for the birthweight data set (birthwt
) found in RStudio. However, I'm only interested in a few variables: age
, ftv
, ptl
and lwt
.
This is the code I have so far:
library(MASS)
library(dplyr)
data("birthwt")
grouped <- group_by(birthwt, age, ftv, ptl, lwt)
summarise(grouped,
mean = mean(bwt),
median = median(bwt),
SD = sd(bwt))
It gives me a pretty-printed table but only a limited number of the SD is filled and the rest say NA
. I just can't work out why or how to fix it!
Upvotes: 5
Views: 7926
Reputation: 696
I stumbled here for another reason and also for me, the answer comes from the docs:
# BEWARE: reusing variables may lead to unexpected results
mtcars %>%
group_by(cyl) %>%
summarise(disp = mean(disp), sd = sd(disp))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#> cyl disp sd
#> <dbl> <dbl> <dbl>
#> 1 4 105. NA
#> 2 6 183. NA
#> 3 8 353. NA
So, in case someone has the same reason as me, instead of reusing a variable, create new ones:
mtcars %>%
group_by(cyl) %>%
summarise(
disp_mean = mean(disp),
disp_sd = sd(disp)
)
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
cyl disp_mean disp_sd
<dbl> <dbl> <dbl>
1 4 105. 26.9
2 6 183. 41.6
3 8 353. 67.8
Upvotes: 14
Reputation: 887541
The number of rows for some of the groups are 1.
grouped %>%
summarise(n = n())
# A tibble: 179 x 5
# Groups: age, ftv, ptl [?]
# age ftv ptl lwt n
# <int> <int> <int> <int> <int>
# 1 14 0 0 135 1
# 2 14 0 1 101 1
# 3 14 2 0 100 1
# 4 15 0 0 98 1
# 5 15 0 0 110 1
# 6 15 0 0 115 1
# 7 16 0 0 110 1
# 8 16 0 0 112 1
# 9 16 0 0 135 2
#10 16 1 0 95 1
According to ?sd
,
The standard deviation of a length-one vector is NA.
This results in NA
values for the sd
where there is only one element
Upvotes: 2