Andrew C
Andrew C

Reputation: 79

Ontime percentage calculations

I need to calculate the overall ontime percentage of each airline with this sample dataset.

library(tidyverse)
library(dplyr)

df_chi <- tribble(
  ~airline, ~ontime, ~qty,~dest,
  'delta',TRUE,527,'CHI',
  'delta',FALSE,92,'CHI',
  'american',TRUE,4229,'CHI',
  'american',FALSE,825,'CHI'
)

df_nyc <- tribble(
  ~airline, ~ontime, ~qty,~dest,
  'delta',TRUE,1817,'NYC',
  'delta',FALSE,567,'NYC',
  'american',TRUE,1651,'NYC',
  'american',FALSE,625,'NYC'
)

I have a solution although it is verbose and I want to avoid the numbered index ie [2,2]. Is there a more elegant way using more of the tidyverse?

df_all <- bind_rows(df_chi,df_nyc)
delta_ot  <- df_all %>% 
  filter(airline == "delta") %>%
  group_by(ontime) %>%
  summarize(total = sum(qty))

delta_ot <- delta_ot[2,2] / sum(delta_ot$total)

american_ot  <- df_all %>% 
  filter(airline == "american") %>%
  group_by(ontime) %>%
  summarize(total = sum(qty))

american_ot <- american_ot[2,2] / sum(american_ot$total)

Upvotes: 0

Views: 30

Answers (1)

akrun
akrun

Reputation: 887068

As on the ontime column is logical column, use that to subset instead of [2, 2]. Also, instead of doing the filter, do this once by adding the 'airline' as a grouping column

library(dplyr)
bind_rows(df_chi, df_nyc) %>% 
  group_by(airline, ontime)  %>% 
  summarise(total = sum(qty), .groups = 'drop_last') %>% 
  summarise(total = total[ontime]/sum(total))

-output

# A tibble: 2 × 2
  airline  total
  <chr>    <dbl>
1 american 0.802
2 delta    0.781

Subsetting by logical returns the corresponding value where there are TRUE elements

> c(1, 3, 5)[c(FALSE, TRUE, FALSE)]
[1] 3

Upvotes: 1

Related Questions