Reputation: 5819
We can group mtcars by cylinder and summarize miles per gallon with some simple code.
library(dplyr)
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(mpg))
This provides the correct output shown below.
cyl avg
1 4 26.66364
2 6 19.74286
3 8 15.10000
If I kindly ask dplyr to exclude NA I get some weird results.
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(!is.na(mpg)))
Since there are no NA in this data set the results should be the same as above. But it averages all mpg to exactly "1". A problem with my code or a bug in dplyr?
cyl avg
1 4 1
2 6 1
3 8 1
My actual data set does have some NA that I need to exclude only for this summarization, but exhibits the same behavior.
Upvotes: 0
Views: 396
Reputation: 1058
You want this:
mtcars %>%
group_by(cyl) %>%
summarise(avg = mean(mpg, na.rm = T))
# A tibble: 3 x 2
cyl avg
<dbl> <dbl>
1 4 26.66364
2 6 19.74286
3 8 15.10000
Right now, you're returning a logical
vector with !is.na(mpg)
. When you take the mean()
of a logical
vector, it'll be coerced to 1, not the numeric
value you desire.
Upvotes: 5
Reputation: 1820
The way you have coded it, the input to the mean()
function is a vector of TRUE and FALSE values. Use mean(mpg[!is.na(mpg)])
instead.
Consider using data.table
which I have used for illustration purposes. The following all produce the same result.
library(data.table)
MT[, mean(mpg), by = cyl]
cyl V1
1: 6 19.74286
2: 4 26.66364
3: 8 15.10000
MT[, mean(mpg, na.rm=TRUE), by = cyl]
cyl V1
1: 6 19.74286
2: 4 26.66364
3: 8 15.10000
MT[, mean(mpg[!is.na(mpg)]), by = cyl]
cyl V1
1: 6 19.74286
2: 4 26.66364
3: 8 15.10000
Upvotes: 0