Reputation: 11
I have some trouble with a dataset I have in data.table. Basically, I have 2 columns: scheduled delivery date and rescheduled delivery date. However, some values are left blank. An example:
Scheduled Rescheduled
NA NA
2016-11-14 2016-11-17
2016-11-15 NA
2016-11-13 2016-11-11
NA 2016-11-15
I want to create a new column, which indicates the most recent
date of both columns, for instance named max_scheduled_date
.
Therefore, if Rescheduled
is NA, the max_scheduled_date
should
take the value of Scheduled
, whilst max_scheduled_date
should
take the value of Rescheduled
if Scheduled
is NA. When both
columns are NA, max_scheduled_date
should obviously take NA.
When both columns have a date, it should take the most recent one.
I have a lot of problems creating this and do not get the results I want.
The dates are POSIXct, which gives me some trouble unfortunately.
Can someone help me out? Thank you in advance, Kind regards, Amanda
Upvotes: 0
Views: 800
Reputation: 42602
As the question is tagged with data.table
, here is also a data.table
solution.
pmax()
seems to work sufficiently well with POSIXct
. Therefore, I see no reason to coerce the date columns from POSIXct
to Date
class.
setDT(DF)[, max_scheduled_date := pmax(Scheduled, Rescheduled, na.rm = TRUE)]
DF
Scheduled Rescheduled max_scheduled_date 1: <NA> <NA> <NA> 2: 2016-11-14 2016-11-17 2016-11-17 3: 2016-11-15 <NA> 2016-11-15 4: 2016-11-13 2016-11-11 2016-11-13 5: <NA> 2016-11-15 2016-11-15
Note that the new column is appended by reference, i.e., without copying the whole object.
DF <- setDF(fread(
"Scheduled Rescheduled
NA NA
2016-11-14 2016-11-17
2016-11-15 NA
2016-11-13 2016-11-11
NA 2016-11-15"
)[, lapply(.SD, as.POSIXct)])
str(DF)
'data.frame': 5 obs. of 2 variables: $ Scheduled : POSIXct, format: NA "2016-11-14" "2016-11-15" ... $ Rescheduled: POSIXct, format: NA "2016-11-17" NA ...
Upvotes: 1
Reputation: 887991
Assuming that both columns are Date
class, we can use pmax
to create the max of the dates for each row
df1[] <- lapply(df1, as.Date) #change to Date class initially
df1$max_scheduled_date <- do.call(pmax, c(df1, na.rm = TRUE))
df1$max_scheduled_date
#[1] NA "2016-11-17" "2016-11-15" "2016-11-13" "2016-11-15"
It can also be done with the tidyverse
library(dplyr)
df1 %>%
mutate_all(as.Date) %>%
mutate(max_scheduled_date = pmax(Scheduled, Rescheduled, na.rm = TRUE))
Upvotes: 0