Reputation: 53
I have CSV file with "date" column but it contains two different date format as the following
7/12/2015 15:28 as m/d/yyyy hh:mm
18-04-2016 18:20 as d/m/yyyy hh:mm
How can I change the format into m/d/yyyy hh: mm, So I can subtract the dates from each other?
Upvotes: 5
Views: 2021
Reputation: 558
Not all *.csv
files are created equally. If the date information is coded as a date in Excel or an open office format but simply displayed in two different formats in the file, it would be much simpler to change the dates in a spreadsheet environment than trying to tackle the task in R.
Upvotes: -3
Reputation: 368439
As @thelatemail stated, it is difficult to deal with ambiguous date formats. You have the worst here: a combination of (North American) month-day-year along with (rest-of-the work) day-month-year.
Now, the anytime package helps here in general as it allows us to parse without requiring explicit formats while also allowing different input formats in the same string.
However, it too must retain some sanity -- and hence does not support mixing d/m/y and m/d/y as you do here by default because there is just no way to automate this.
But here we can just opt to add a single missing format, and all is well:
R> library(anytime)
R> anytime::addFormats("%d-%m-%Y %H:%M:%S") # add a day-month-year variant
R> anytime(c("7/12/2015 15:28", "18-04-2016 18:20"))
[1] "2015-07-12 15:28:00 CDT" "2016-04-18 18:20:00 CDT"
R>
And with that, the difference is a simple
R> diff(anytime(c("7/12/2015 15:28", "18-04-2016 18:20")))
Time difference of 281.119 days
R>
Upvotes: 4
Reputation: 887621
Here is another option with lubridate
library(lubridate)
parse_date_time(x, c("mdy HM", "dmy HM"))
#[1] "2015-12-07 15:28:00 UTC" "2016-04-18 18:20:00 UTC"
x <- c("7/12/2015 15:28","18-04-2016 18:20")
Upvotes: 3
Reputation: 93938
External packages are not required if you simply have two datetime formats. Just run both formats through the parser and take the non-missing one:
x <- c("7/12/2015 15:28","18-04-2016 18:20")
pmax(
as.POSIXct(x, format="%m/%d/%Y %H:%M", tz="UTC"),
as.POSIXct(x, format="%d-%m-%Y %H:%M", tz="UTC"),
na.rm=TRUE
)
#[1] "2015-07-12 15:28:00 UTC" "2016-04-18 18:20:00 UTC"
As far as I know, there is absolutely no way to deal with ambiguous date formats automatically, so hard-coding is the way to go here probably.
Upvotes: 9