alexb523
alexb523

Reputation: 738

parse_date_time mixing up day and month

I am using parse_date_time on a few columns in my data which have multiple date formats. For instance, a column could have the date formats "%d-%b-%y" and, "%m/%d/%Y".

Thus, "06/11/2017" ("%m/%d/%Y") and "11-Jun-17" ("%d-%b-%y") correspond to the same date: "2017-06-11" ("%Y-%m-%d")

I am using lubridate::parse_date_time to parse the dates, where different date formats can be specified in the orders argument:

x <- c("06-11-2017", "11-Jun-17")
parse_date_time(x, orders = c("%d-%b-%y", "%m/%d/%Y"), locale = "eng")
# [1] "2017-11-06 UTC" "2017-06-11 UTC"

While this should spit out both dates as "2017-06-11", it doesn't parse month and day correctly in the "%m/%d/%Y" date, checked by changing the month:

x <- c("05-11-2017", "11-Jun-17")
parse_date_time(x, orders = c("%d-%b-%y", "%m/%d/%Y"), locale = "eng")
# [1] "2017-11-05 UTC" "2017-06-11 UTC"

The 05 should be parsed as %m, but it is not.

My investigations of parse_date_time indicate there are bugs, which I think would be the case here. However, I am wondering what I can do to get the results I need?

update: here is a an example of the data i am using. by adding "exact = TRUE" it fixes my issue.

test$Transmission.date [1] "22-Feb-16" "10-Mar-16" "10-Mar-16" "10-Mar-16" "15-Jan-16" "15-Jan-16" "15-Jan-16" "15-Jan-16" "15-Jan-16" "15-Jan-16" "4/6/2016" [12] "4/6/2016" "4/6/2016" "4/6/2016" "4/12/2016" "4/12/2016" "4/12/2016" "4/12/2016" "4/12/2016" "4/12/2016" "4/12/2016" "4/12/2016" [23] "4/12/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" "4/4/2016" [34] "4/4/2016" "4/4/2016" test$Buyer.RFQ.Close.Date [1] "25-Feb-16" "10-Mar-16" "10-Mar-16" "10-Mar-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "5/4/2016" [12] "5/4/2016" "5/4/2016" "5/4/2016" "5/10/2016" "5/10/2016" "5/10/2016" "5/10/2016" "5/10/2016" "5/10/2016" "5/10/2016" "5/10/2016" [23] "5/10/2016" "4/28/2016" "5/2/2016" "4/28/2016" "5/2/2016" "5/2/2016" "5/2/2016" "5/2/2016" "5/2/2016" "4/28/2016" "5/2/2016" [34] "4/28/2016" "5/2/2016" test$RFQ.close.date [1] "25-Feb-16" "10-Mar-16" "10-Mar-16" "10-Mar-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "27-Jan-16" "4/22/2016" [12] "4/22/2016" "4/22/2016" "4/22/2016" "5/3/2016" "5/3/2016" "5/3/2016" "5/3/2016" "5/3/2016" "5/3/2016" "5/3/2016" "5/3/2016" [23] "5/3/2016" "" "4/28/2016" "" "4/28/2016" "4/28/2016" "4/28/2016" "4/28/2016" "4/28/2016" "" "4/28/2016" [34] "" "4/28/2016"

Upvotes: 1

Views: 412

Answers (1)

d.b
d.b

Reputation: 32548

If it's only two different formats, you could do it in base R with relative ease. If there can be more formats, other approach would be necessary

x = c("06/11/2017", "11-Jun-17")
as.Date(x, format = c("%m/%d/%Y", "%d-%b-%y")[1 + grepl(pattern = "\\D{3}", x)])
#[1] "2017-06-11" "2017-06-11"

Upvotes: 2

Related Questions