How to create overall itinerary

Question

I have a dataset with travels that looks like this

from <- c("NYC","PAR", "MAD")
to <- c('PAR', 'SYD', "BCN")
date <- c("05/07","05/07", "06/08")
step <- c(1, 2, 1)
df <- data.frame(from, to, date, step)

The "step" column tells the step of the trip. Some are simple travels (like MAD-BCN), but others have a step (like NYC-PAR-SYD) (or sometimes two steps).

I would like to create a column that resumes the complete trip if there is a step. In our example we should create a column "trip" that contains "NYC-PAR-SYD" for the 2 first rows (NYC-PAR and PAR-SYD), and just "MAD-BCN" for the third row.

The problem for me is that there are hundreds of thousands of rows, and I would like to find an effective way to do it.

AnilGoyal · Accepted Answer

A simple tidyverse strategy on an elaborated example

df <- data.frame(
  stringsAsFactors = FALSE,
              from = c("ABC", "BCD", "DEF", "FGH", "CDE", "IJK", "JKL", "LMN"),
                to = c("BCD", "DEF", "FGH", "GHI", "GHI", "JKL", "LMN", "OPQ"),
              step = c(1L, 2L, 3L, 4L, 1L, 1L, 2L, 1L)
)

df
#>   from  to step
#> 1  ABC BCD    1
#> 2  BCD DEF    2
#> 3  DEF FGH    3
#> 4  FGH GHI    4
#> 5  CDE GHI    1
#> 6  IJK JKL    1
#> 7  JKL LMN    2
#> 8  LMN OPQ    1

library(tidyverse)

df %>%
  mutate(id = cumsum(step == 1)) %>%
  split(.$id) %>%
  map_dfr(~ .x %>%
            pivot_longer(c('from', 'to')) %>%
            summarise(id = first(id),
                      itinerary = paste(value[!duplicated(value)], collapse = '->')))

#> # A tibble: 4 x 2
#>      id itinerary              
#>                      
#> 1     1 ABC->BCD->DEF->FGH->GHI
#> 2     2 CDE->GHI               
#> 3     3 IJK->JKL->LMN          
#> 4     4 LMN->OPQ

^{Created on 2021-06-01 by the reprex package (v2.0.0)}

How to create overall itinerary

Answers (2)

Related Questions