jesha
jesha

Reputation: 11

How to match value of one row with its immediate another row in different column in R

So I have this data of customers buying airline tickets online. I want to see how many of them have booked return tickets. So basically I want to match the origin city to the destination city of the immediate row and vice versa for the same PERSON and Account that will give me a data with their two way travel and then I want to calculate the days of their journey. I am trying to do this in R but I am unable to match origin with destination of the immediate row and vice versa.

I have sorted the account numbers of customers to manually see if there are any return journeys and there are quite a lot.

The data is like this:

Account number          origin city Destination city    Date
1                     London    chicago              7/22/2018
2                      Milan    London               7/23/2018
2                      London    Milan               7/28/2018
1                     chicago    london              8/22/2018

Upvotes: 1

Views: 46

Answers (1)

Jon Spring
Jon Spring

Reputation: 66520

Another option would be to join to itself, with the fields reversed.

Edit: Added "trip_num" to deal better with repeated trips by the same person.

library(dplyr)
# First, convert date field to Date type
df <- df %>% 
  mutate(Date = lubridate::mdy(Date)) %>%
  # update with M-M's suggestion in comments
  mutate_at(.vars = vars(origin_city, Destination_city), .funs = toupper) %>%
  # EDIT: adding trip_num to protect against extraneous joins for repeat trips
  group_by(Account_number, origin_city, Destination_city) %>%
  mutate(trip_num = row_number()) %>%
  ungroup()

df2 <- df %>%
  left_join(df, by = c("Account_number", "trip_num",
                       "origin_city" = "Destination_city",
                       "Destination_city" = "origin_city")) %>%
  mutate(days = (Date.x - Date.y)/lubridate::ddays(1))


> df2
# A tibble: 6 x 7
  Account_number origin_city Destination_city Date.x     trip_num Date.y      days
           <int> <chr>       <chr>            <date>        <int> <date>     <dbl>
1              1 LONDON      CHICAGO          2018-07-22        1 2018-08-22   -31
2              2 MILAN       LONDON           2018-07-23        1 2018-07-28    -5
3              2 LONDON      MILAN            2018-07-28        1 2018-07-23     5
4              1 CHICAGO     LONDON           2018-08-22        1 2018-07-22    31
5              2 MILAN       LONDON           2018-08-23        2 2018-08-28    -5
6              2 LONDON      MILAN            2018-08-28        2 2018-08-23     5

Data: (added repeat trip by Account_number 2)

df <- read.table(
  header = T, 
  stringsAsFactors = F,
  text = "Account_number          origin_city Destination_city    Date
1                     London    chicago              7/22/2018
2                      Milan    London               7/23/2018
2                      London    Milan               7/28/2018
1                     chicago    london              8/22/2018
2                      Milan    London               8/23/2018
2                      London    Milan               8/28/2018")

Upvotes: 2

Related Questions