XCCH004
XCCH004

Reputation: 323

Is there method in R to replace values in one data frame with a related value from another data frame?

I need to replace the values in one data frame with values from another data frame. I am trying to find the simplest way to do so but I may be over thinking it.

Here is a sample of the data from DF1:

season  article 1st_booking 2nd_booking
SS20    EF0647  2019-06-25  2019-07-09  
SS20    059611  2019-07-30  2019-08-13  
SS20    EG3208  2019-10-29  <NA>    
SS20    EF9348  2019-10-29  2019-11-12  
SS20    EE4609  2019-08-27  2019-10-29  
SS20    EF7610  2019-09-24  2019-10-29  
SS20    EH1307  2019-09-24  2019-10-29  
SS20    EH1308  2019-09-24  2019-10-29  
SS20    EH1309  2019-09-24  2019-10-29  
SS20    EH1310  2019-09-24  2019-10-29  

And from DF2:

season  article order_cutoff  booking_deadline
SS20    EF0647  2019-06-25  2019-06-07  
SS20    EF0647  2019-07-09  2019-06-07  
SS20    EF0647  2019-12-10  2019-11-08  
SS20    059611  2019-07-30  2019-07-12  
SS20    059611  2019-08-13  2019-07-12  
SS20    059611  2019-10-08  2019-09-06  
SS20    EG3208  2019-10-29  2019-10-11  
SS20    EF9348  2019-10-29  2019-10-11  
SS20    EF9348  2019-11-12  2019-10-11  
SS20    EF9348  2019-11-26  2019-11-08

Note that 1st_booking & 2nd_booking from DF1 are called 'order_cutoff' in DF2. What I would like to do is in DF1, replace the values in columns 1st_booking & 2nd_booking with the related booking_deadline from DF2. Tried to do a merge but I don't want to create a new column - just replace the values in DF1 with the values in DF2

Upvotes: 0

Views: 99

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389135

I am not exactly sure about the expected output. If you want to match 1st_booking and 2nd_booking to order_cutoff for each article and season, we can get the data in long format do a left_join matching the corresponding columns and get the data in wide format again.

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(cols = ends_with("booking")) %>%
  left_join(df2, by = c('season' = 'season', 'article' = 'article', 
                        'value' = 'order_cutoff')) %>%
  select(-value) %>%
  pivot_wider(names_from = name, values_from = booking_deadline)

# A tibble: 10 x 4
#   season article `1st_booking` `2nd_booking`
#   <fct>  <chr>   <fct>         <fct>        
# 1 SS20   EF0647  2019-06-07    2019-06-07   
# 2 SS20   059611  2019-07-12    2019-07-12   
# 3 SS20   EG3208  2019-10-11    NA           
# 4 SS20   EF9348  2019-10-11    2019-10-11   
# 5 SS20   EE4609  NA            NA           
# 6 SS20   EF7610  NA            NA           
# 7 SS20   EH1307  NA            NA           
# 8 SS20   EH1308  NA            NA           
# 9 SS20   EH1309  NA            NA           
#10 SS20   EH1310  NA            NA     

If you want to only combine by dates and not season and article you can use match

transform(df1,
    `1st_booking` = df2$booking_deadline[match(`1st_booking`, df2$order_cutoff)], 
    `2nd_booking` = df2$booking_deadline[match(`2nd_booking`, df2$order_cutoff)])

data

df1 <- structure(list(season = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "SS20", class = "factor"), article = structure(c(3L, 
1L, 6L, 5L, 2L, 4L, 7L, 8L, 9L, 10L), .Label = c("059611", "EE4609", 
"EF0647", "EF7610", "EF9348", "EG3208", "EH1307", "EH1308", "EH1309", 
"EH1310"), class = "factor"), `1st_booking` = structure(c(1L, 
2L, 5L, 5L, 3L, 4L, 4L, 4L, 4L, 4L), .Label = c("2019-06-25", 
"2019-07-30", "2019-08-27", "2019-09-24", "2019-10-29"), class = "factor"), 
`2nd_booking` = structure(c(2L, 3L, 1L, 5L, 4L, 4L, 4L, 4L, 
4L, 4L), .Label = c("<NA>", "2019-07-09", "2019-08-13", "2019-10-29", 
"2019-11-12"), class = "factor")), class = "data.frame", row.names = c(NA, -10L))

df2 <- structure(list(season = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "SS20", class = "factor"), article = structure(c(2L, 
2L, 2L, 1L, 1L, 1L, 4L, 3L, 3L, 3L), .Label = c("059611", "EF0647", 
"EF9348", "EG3208"), class = "factor"), order_cutoff = structure(c(1L, 
2L, 9L, 3L, 4L, 5L, 6L, 6L, 7L, 8L), .Label = c("2019-06-25", 
"2019-07-09", "2019-07-30", "2019-08-13", "2019-10-08", "2019-10-29", 
"2019-11-12", "2019-11-26", "2019-12-10"), class = "factor"), 
booking_deadline = structure(c(1L, 1L, 5L, 2L, 2L, 3L, 4L, 
4L, 4L, 5L), .Label = c("2019-06-07", "2019-07-12", "2019-09-06", 
"2019-10-11", "2019-11-08"), class = "factor")), class = "data.frame", 
row.names = c(NA, -10L))

Upvotes: 1

Related Questions