Abdallah Khamash
Abdallah Khamash

Reputation: 53

Converting to date in a character column that contains two date formats

I have CSV file with "date" column but it contains two different date format as the following

7/12/2015 15:28 as m/d/yyyy hh:mm
18-04-2016 18:20  as d/m/yyyy hh:mm

How can I change the format into m/d/yyyy hh: mm, So I can subtract the dates from each other?

Upvotes: 5

Views: 2021

Answers (4)

Tavrock
Tavrock

Reputation: 558

Not all *.csv files are created equally. If the date information is coded as a date in Excel or an open office format but simply displayed in two different formats in the file, it would be much simpler to change the dates in a spreadsheet environment than trying to tackle the task in R.

Upvotes: -3

Dirk is no longer here
Dirk is no longer here

Reputation: 368439

As @thelatemail stated, it is difficult to deal with ambiguous date formats. You have the worst here: a combination of (North American) month-day-year along with (rest-of-the work) day-month-year.

Now, the anytime package helps here in general as it allows us to parse without requiring explicit formats while also allowing different input formats in the same string.

However, it too must retain some sanity -- and hence does not support mixing d/m/y and m/d/y as you do here by default because there is just no way to automate this.

But here we can just opt to add a single missing format, and all is well:

R> library(anytime)
R> anytime::addFormats("%d-%m-%Y %H:%M:%S")  # add a day-month-year variant
R> anytime(c("7/12/2015 15:28", "18-04-2016 18:20"))
[1] "2015-07-12 15:28:00 CDT" "2016-04-18 18:20:00 CDT"
R> 

And with that, the difference is a simple

R> diff(anytime(c("7/12/2015 15:28", "18-04-2016 18:20")))
Time difference of 281.119 days
R> 

Upvotes: 4

akrun
akrun

Reputation: 887621

Here is another option with lubridate

library(lubridate)
parse_date_time(x, c("mdy HM", "dmy HM"))   
#[1] "2015-12-07 15:28:00 UTC" "2016-04-18 18:20:00 UTC"

data

x <- c("7/12/2015 15:28","18-04-2016 18:20")

Upvotes: 3

thelatemail
thelatemail

Reputation: 93938

External packages are not required if you simply have two datetime formats. Just run both formats through the parser and take the non-missing one:

x <- c("7/12/2015 15:28","18-04-2016 18:20")
pmax(
  as.POSIXct(x, format="%m/%d/%Y %H:%M", tz="UTC"),
  as.POSIXct(x, format="%d-%m-%Y %H:%M", tz="UTC"),
  na.rm=TRUE
)
#[1] "2015-07-12 15:28:00 UTC" "2016-04-18 18:20:00 UTC"

As far as I know, there is absolutely no way to deal with ambiguous date formats automatically, so hard-coding is the way to go here probably.

Upvotes: 9

Related Questions