Amanda Bovul
Amanda Bovul

Reputation: 11

Create new column with most recent date

I have some trouble with a dataset I have in data.table. Basically, I have 2 columns: scheduled delivery date and rescheduled delivery date. However, some values are left blank. An example:

Scheduled        Rescheduled
NA               NA
2016-11-14       2016-11-17
2016-11-15       NA
2016-11-13       2016-11-11
NA               2016-11-15

I want to create a new column, which indicates the most recent date of both columns, for instance named max_scheduled_date. Therefore, if Rescheduled is NA, the max_scheduled_date should take the value of Scheduled, whilst max_scheduled_date should take the value of Rescheduled if Scheduled is NA. When both columns are NA, max_scheduled_date should obviously take NA. When both columns have a date, it should take the most recent one. I have a lot of problems creating this and do not get the results I want.

The dates are POSIXct, which gives me some trouble unfortunately.

Can someone help me out? Thank you in advance, Kind regards, Amanda

Upvotes: 0

Views: 800

Answers (2)

Uwe
Uwe

Reputation: 42602

As the question is tagged with data.table, here is also a data.table solution.

pmax() seems to work sufficiently well with POSIXct. Therefore, I see no reason to coerce the date columns from POSIXct to Date class.

setDT(DF)[, max_scheduled_date := pmax(Scheduled, Rescheduled, na.rm = TRUE)]
DF
    Scheduled Rescheduled max_scheduled_date
1:       <NA>        <NA>               <NA>
2: 2016-11-14  2016-11-17         2016-11-17
3: 2016-11-15        <NA>         2016-11-15
4: 2016-11-13  2016-11-11         2016-11-13
5:       <NA>  2016-11-15         2016-11-15

Note that the new column is appended by reference, i.e., without copying the whole object.

Data

DF <- setDF(fread(
  "Scheduled        Rescheduled
NA               NA
2016-11-14       2016-11-17
2016-11-15       NA
2016-11-13       2016-11-11
NA               2016-11-15"
)[, lapply(.SD, as.POSIXct)])
str(DF)
'data.frame': 5 obs. of  2 variables:
 $ Scheduled  : POSIXct, format: NA "2016-11-14" "2016-11-15" ...
 $ Rescheduled: POSIXct, format: NA "2016-11-17" NA ...

Upvotes: 1

akrun
akrun

Reputation: 887991

Assuming that both columns are Date class, we can use pmax to create the max of the dates for each row

df1[] <- lapply(df1, as.Date) #change to Date class initially
df1$max_scheduled_date <- do.call(pmax, c(df1, na.rm = TRUE))
df1$max_scheduled_date
#[1] NA           "2016-11-17" "2016-11-15" "2016-11-13" "2016-11-15"

It can also be done with the tidyverse

library(dplyr)
df1 %>%
    mutate_all(as.Date) %>%
    mutate(max_scheduled_date = pmax(Scheduled, Rescheduled, na.rm = TRUE))

Upvotes: 0

Related Questions