Nuller
Nuller

Reputation: 107

Change date in a data.frame based on matching string in a date vector

Background: I have a large data.frame consisting of values spanning from January 2021 until December 2021. However, the problem is that these dates do not match my dates in my SQL database. Thus, I have created a "ChangeDates" column which contains the "correct dates:

ChangeDates = c(
  "2021-01-29",
  "2021-02-26",
  "2021-03-31")

Furthermore, I have created a data.frame that consits of the dates that I want to change:

df_dates = data.frame(
  Date = c("2021-01-27", "2021-01-27","2021-01-27", "2021-01-27", "2021-01-27", "2021-01-27", "2021-01-27",
           "2021-01-27" ,"2021-01-27", "2021-01-27", "2021-01-27" ,"2021-01-27" ,"2021-01-27", "2021-01-27" ,
           "2021-01-27", "2021-01-27" ,"2021-01-27" ,"2021-01-27", "2021-01-27", "2021-01-27" ,"2021-01-27", "2021-01-27",
           "2021-01-27", "2021-01-27" ,"2021-01-27" ,"2021-01-27" ,"2021-01-27" ,"2021-01-27", "2021-01-27", "2021-01-27",
           "2021-01-27", "2021-02-27", "2021-02-27", "2021-02-27" ,"2021-02-27", "2021-02-27" ,"2021-02-27" ,"2021-02-27", "2021-02-27",
           "2021-02-27", "2021-02-27" ,"2021-02-27", "2021-02-27" ,"2021-02-27" ,"2021-03-29" ,"2021-03-29", "2021-03-29" ,"2021-03-29")
)

Goal: My goal is to somehow "dynamically" change all the "2021-01-27" dates to "2021-01-29", the "2021-02-27" dates to "2021-02-26" and so on. So, the only manual thing I have to do is to adjust the dates in the "ChangeDates" vector and then the dates in df_dates are corrected as such.

I tried out with the "grepl" function. For example, if "2021-01" is found in df_dates and in ChangeDates, then change the cell value in df_dates to the value in ChangeDates.. However, it seems that I have a bit of trouble in doing so..

Any ideas or suggestions would be much appreciated.

Upvotes: 0

Views: 71

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28705

First, create a table of "from" and "to" dates, and check that it is accurate.

library(dplyr, warn.conflicts = FALSE)

change_table <- tibble(from = unique(df_dates$Date), to = ChangeDates)
change_table
#> # A tibble: 3 × 2
#>   from       to        
#>   <chr>      <chr>     
#> 1 2021-01-27 2021-01-29
#> 2 2021-02-27 2021-02-26
#> 3 2021-03-29 2021-03-31

If so, join this to your data frame and coalesce the to with your Date variable.

df_dates %>% 
  left_join(change_table, by = c('Date' = 'from')) %>% 
  mutate(Date = coalesce(to, Date), 
         to = NULL)
#>          Date
#> 1  2021-01-29
#> 2  2021-01-29
#> 3  2021-01-29
#> 4  2021-01-29
#> 5  2021-01-29
#> 6  2021-01-29
#> 7  2021-01-29
#> 8  2021-01-29
#> 9  2021-01-29
#> 10 2021-01-29
#> 11 2021-01-29
#> 12 2021-01-29
#> 13 2021-01-29
#> 14 2021-01-29
#> 15 2021-01-29
#> 16 2021-01-29
#> 17 2021-01-29
#> 18 2021-01-29
#> 19 2021-01-29
#> 20 2021-01-29
#> 21 2021-01-29
#> 22 2021-01-29
#> 23 2021-01-29
#> 24 2021-01-29
#> 25 2021-01-29
#> 26 2021-01-29
#> 27 2021-01-29
#> 28 2021-01-29
#> 29 2021-01-29
#> 30 2021-01-29
#> 31 2021-01-29
#> 32 2021-02-26
#> 33 2021-02-26
#> 34 2021-02-26
#> 35 2021-02-26
#> 36 2021-02-26
#> 37 2021-02-26
#> 38 2021-02-26
#> 39 2021-02-26
#> 40 2021-02-26
#> 41 2021-02-26
#> 42 2021-02-26
#> 43 2021-02-26
#> 44 2021-02-26
#> 45 2021-03-31
#> 46 2021-03-31
#> 47 2021-03-31
#> 48 2021-03-31

Created on 2022-01-26 by the reprex package (v2.0.1)

Upvotes: 2

Valentia
Valentia

Reputation: 46

One way to do it would be creating another vector which indicates the current values in the df that you want to replace with the ones in the ChangeDates vector.

FromDates = c("2021-01-27",
              "2021-02-27",
              "2021-03-29")

Then look for the position in that vector that matches the respective current values in the df, feeding a new df with the values in the vector matching that same position.

new.df_dates <- data.frame()

for(i in 1:dim(df_dates)[1]){

  new.df_dates[i,1] <- ChangeDates[which(FromDates==df_dates$Date[i])]
  
}

colnames(new.df_dates) <- colnames(df_dates)

With this, you'd get:

head(new.df_dates)
        Date
1 2021-01-29
2 2021-01-29
3 2021-01-29
4 2021-01-29
5 2021-01-29
6 2021-01-29

tail(new.df_dates)
         Date
43 2021-02-26
44 2021-02-26
45 2021-03-31
46 2021-03-31
47 2021-03-31
48 2021-03-31

Upvotes: 1

Related Questions