OFAJ
OFAJ

Reputation: 3

How to calculate the number of flights with an specific condition

I'm using the nycflights13::flights dataframe and want to calculate the number of flights an airplane have flown before its first more than 1 hour delay. How can I do this? I've tried with a group_by and filter, but I haven't been able to. Is there a method to count the rows till a condition (e.g. till the first dep_delay >60)?

Thanks.

Upvotes: 0

Views: 1440

Answers (2)

Edward
Edward

Reputation: 18683

library(dplyr)
library(nycflights13)
data("flights")

There may be more elegant ways, but this code counts the total number of flights made by each plane (omitting cancelled flights) and joins this with flights that were not cancelled, grouping on the unique plane identifier (tailnum), sorting on departure date/time, assigning the row_number less 1, filtering on delays>60, and taking the first row.

select(
  filter(flights, !is.na(dep_time)) %>% 
  count(tailnum, name="flights") %>% left_join(
      filter(flights, !is.na(dep_time)) %>% 
      group_by(tailnum) %>%
      arrange(month, day, dep_time) %>%
      mutate(not_delayed=row_number() -1) %>%
      filter(dep_delay>60) %>% slice(1)), 
  tailnum, flights, not_delayed)

# A tibble: 4,037 x 3
   tailnum flights not_delayed
   <chr>     <int>       <dbl>
 1 D942DN        4           0
 2 N0EGMQ      354          53
 3 N10156      146           9
 4 N102UW       48          25
 5 N103US       46          NA
 6 N104UW       47           3
 7 N10575      272           0
 8 N105UW       45          22
 9 N107US       41          20
10 N108UW       60          36
# ... with 4,027 more rows

The plane with tailnum N103US has made 46 flights, of which none have been delayed by more than 1 hour. So the number of flights it has made before its first 1 hour delay is undefined or NA.

Upvotes: 1

OFAJ
OFAJ

Reputation: 3

I got the answer:

flights %>%
#Eliminate the NAs
filter(!is.na(dep_time)) %>% 
#Sort by date and time
arrange(time_hour) %>% 
group_by(tailnum) %>%
#cumulative number of flights delayed more than one hour
mutate(acum_delay = cumsum(dep_delay > 60)) %>% 
#count the number of flights                                         
summarise(before_1hdelay = sum(acum_delay < 1))

Upvotes: 0

Related Questions