Reputation: 78
I have a large number of csv files with a datetime column which have been manually patched in the past and now include two date formats:
dd/mm/yyyy hh:mm
mm/dd/yyyy hh:mm
I am having trouble parsing the date correctly from the switch-over point
I am using lubridate::parse_date_time() with the orders I have:
parse_date_time(date_time, orders = c("dmY HM", "mdY HM", "Ymd HMS"))
This works fine when one of the values is greater than 12 and so is correctly identified as a day not a month. The issue is when the date parses (correctly or incorrectly) using either format:
eg: 03/04/2011 could be: 2011-04-03 or 2011-03-04
Upvotes: 0
Views: 216
Reputation: 3499
If you want to base it on the order as you suggested in the comments you could do something like what I've done below. You could definitely run into problems if there is long periods missing and I haven't thoroughly tested the logic
library(dplyr)
library(zoo)
#Made up some data
testinput <- read.table(sep = ",",
text = "25/03/2010 01:00
03/04/2010 02:00
15/04/2010 01:00
05/01/2010 07:00
16/05/2010 09:00
05/20/2010 15:00
05/22/2010 08:00
01/06/2010 21:00
20/06/2010 14:00"
)
# try converting it both ways
test <- testinput %>% mutate(date1 = as.POSIXct(V1, format = "%d/%m/%Y %H:%M"),
date2 = as.POSIXct(V1, format = "%m/%d/%Y %H:%M"))
#check the order - not sure if there is a better way to do this
test2 <- test %>% mutate(date1test = date1 >= lag(zoo::na.locf(date1, na.rm = F))&
date1<= lead(zoo::na.locf(date1,fromLast=T, na.rm = F)),
date2test = date2 >= lag(zoo::na.locf(date2, na.rm = F))&
date2<= lead(zoo::na.locf(date2,fromLast=T, na.rm = F)))
#pick which one to use
test2 %>% mutate(Out =
case_when(is.na(date2)~date1,
is.na(date1)~date2,
date1test~date1,
date2test~date2))
Upvotes: 1