Reputation: 65
I am trying to get the number of canceled flights per month alongside these other two columns, instead i can only seem to get the total number of flights next to all the months.
Here is my code:
library(nycflights13)
flights = nycflights13::flights
flights %>% select(arr_delay,month,dep_time) %>%
group_by(month) %>%
summarise(Mean = mean(arr_delay, na.rm = TRUE), canceled = count(filter(flights, is.na(dep_time))))
month Mean canceled$n
<int> <dbl> <int>
1 1 6.13 8255
2 2 5.61 8255
3 3 5.81 8255
4 4 11.2 8255
5 5 3.52 8255
6 6 16.5 8255
7 7 16.7 8255
8 8 6.04 8255
9 9 -4.02 8255
10 10 -0.167 8255
11 11 0.461 8255
12 12 14.9 8255
Upvotes: 0
Views: 263
Reputation: 160447
By calling filter(flights, ..)
within the mutate of a grouped frame, you're looking at the entire flights
, not the data present within the current group.
I suggest
flights %>%
select(arr_delay,month,dep_time) %>%
group_by(month) %>%
summarise(
Mean = mean(arr_delay, na.rm = TRUE),
canceled = sum(is.na(dep_time))
)
# # A tibble: 12 x 3
# month Mean canceled
# <int> <dbl> <int>
# 1 1 6.13 521
# 2 2 5.61 1261
# 3 3 5.81 861
# 4 4 11.2 668
# 5 5 3.52 563
# 6 6 16.5 1009
# 7 7 16.7 940
# 8 8 6.04 486
# 9 9 -4.02 452
# 10 10 -0.167 236
# 11 11 0.461 233
# 12 12 14.9 1025
This is similar to the rationale of not using the original frame name. For instance, if we try
mtcars %>%
filter(disp > 350) %>%
summarize(
mu1 = mean(mtcars$mpg),
mu2 = mean(mpg)
)
# mu1 mu2
# 1 20.09062 14.78571
mtcars
originally has 32 rows, but only 7 rows after filter(disp > 350)
. For the calculation of mu1
, we are reaching out to look at the original mtcars
, all 32 rows of it; for mu2
, we are only looking at the rows present in the data at that point in time, only 7 rows in this example.
So anytime you start a pipe with an object, the only reason you should ever use that object name again in a dplyr verb is if you intentionally want to look at the original state of the frame. In your case, I think you did not, you needed to look at the grouped/filtered data at that point in the pipeline.
Upvotes: 1