Reputation: 738
I am using parse_date_time
on a few columns in my data which have multiple date formats. For instance, a column could have the date formats "%d-%b-%y"
and, "%m/%d/%Y"
.
Thus, "06/11/2017"
("%m/%d/%Y"
) and "11-Jun-17"
("%d-%b-%y"
) correspond to the same date: "2017-06-11" ("%Y-%m-%d"
)
I am using lubridate::parse_date_time
to parse the dates, where different date formats can be specified in the orders
argument:
x <- c("06-11-2017", "11-Jun-17")
parse_date_time(x, orders = c("%d-%b-%y", "%m/%d/%Y"), locale = "eng")
# [1] "2017-11-06 UTC" "2017-06-11 UTC"
While this should spit out both dates as "2017-06-11"
, it doesn't parse month and day correctly in the "%m/%d/%Y"
date, checked by changing the month:
x <- c("05-11-2017", "11-Jun-17")
parse_date_time(x, orders = c("%d-%b-%y", "%m/%d/%Y"), locale = "eng")
# [1] "2017-11-05 UTC" "2017-06-11 UTC"
The 05
should be parsed as %m
, but it is not.
My investigations of parse_date_time
indicate there are bugs, which I think would be the case here. However, I am wondering what I can do to get the results I need?
update: here is a an example of the data i am using. by adding "exact = TRUE" it fixes my issue.
test$Transmission.date [1] "22-Feb-16" "10-Mar-16" "10-Mar-16" "10-Mar-16" "15-Jan-16" "15-Jan-16" "15-Jan-16" "15-Jan-16" "15-Jan-16" "15-Jan-16" "4/6/2016" [12] "4/6/2016" "4/6/2016" "4/6/2016" "4/12/2016" "4/12/2016" "4/12/2016" "4/12/2016" "4/12/2016" "4/12/2016" "4/12/2016" "4/12/2016" [23] "4/12/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" [34] "4/4/2016" "4/4/2016" test$Buyer.RFQ.Close.Date [1] "25-Feb-16" "10-Mar-16" "10-Mar-16" "10-Mar-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "5/4/2016" [12] "5/4/2016" "5/4/2016" "5/4/2016" "5/10/2016" "5/10/2016" "5/10/2016" "5/10/2016" "5/10/2016" "5/10/2016" "5/10/2016" "5/10/2016" [23] "5/10/2016" "4/28/2016" "5/2/2016" "4/28/2016" "5/2/2016" "5/2/2016" "5/2/2016" "5/2/2016" "5/2/2016" "4/28/2016" "5/2/2016" [34] "4/28/2016" "5/2/2016" test$RFQ.close.date [1] "25-Feb-16" "10-Mar-16" "10-Mar-16" "10-Mar-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "4/22/2016" [12] "4/22/2016" "4/22/2016" "4/22/2016" "5/3/2016" "5/3/2016" "5/3/2016" "5/3/2016" "5/3/2016" "5/3/2016" "5/3/2016" "5/3/2016" [23] "5/3/2016" "" "4/28/2016" "" "4/28/2016" "4/28/2016" "4/28/2016" "4/28/2016" "4/28/2016" "" "4/28/2016" [34] "" "4/28/2016"
Upvotes: 1
Views: 412
Reputation: 32548
If it's only two different formats, you could do it in base R with relative ease. If there can be more formats, other approach would be necessary
x = c("06/11/2017", "11-Jun-17")
as.Date(x, format = c("%m/%d/%Y", "%d-%b-%y")[1 + grepl(pattern = "\\D{3}", x)])
#[1] "2017-06-11" "2017-06-11"
Upvotes: 2