AudileF
AudileF

Reputation: 446

R - Formatting dates in dataframe - mix of decimal and character values

I have a date column in a dataframe. I have read this df into R using openxlsx. The column is 'seen' as a character vector when I use typeof(df$date).

The column contains date information in several formats and I am looking to get this into the one format.

#Example
date <- c("43469.494444444441", "12/31/2019 1:41 PM", "12/01/2019  16:00:00")

#What I want -updated
fixed <- c("2019-04-01", "2019-12-31", "2019-12-01")

I have tried many work arounds including openxlsx::ConvertToDate, lubridate::parse_date_time, lubridate::date_decimal

openxlsx::ConvertToDateso far works best but it will only take 1 format and coerce NAs for the others

update

I realized I actually had one of the above output dates wrong. Value 43469.494444444441 should convert to 2019-04-01.

Upvotes: 4

Views: 948

Answers (2)

jay.sf
jay.sf

Reputation: 73342

You could use a helper function to normalize the dates which might be slightly faster than lubridate.

There are weird origins in MS Excel that depend on platform. So if the data are imported from different platforms, you may want to work woth dummy variables.

normDate <- Vectorize(function(x) {
if (!is.na(suppressWarnings(as.numeric(x))))  # Win excel
  as.Date(as.numeric(x), origin="1899-12-30")
else if (grepl("A|P", x))
  as.Date(x, format="%m/%d/%Y %I:%M %p")
else
  as.Date(x, format="%m/%d/%Y %R")
})

For additional date formats just add another else if. Format specifications can be found with ?strptime.

Then just use as.Date() with usual origin.

res <- as.Date(normDate(date), origin="1970-01-01")
# 43469.494444444441   12/31/2019 1:41 PM 12/01/2019  16:00:00 
#       "2019-01-04"         "2019-12-31"         "2019-12-01"
class(res)
# [1] "Date"

Edit: To achieve a specific output format, use format, e.g.

format(res, "%Y-%d-%m")
# 43469.494444444441   12/31/2019 1:41 PM 12/01/2019  16:00:00 
#       "2019-04-01"         "2019-31-12"         "2019-01-12" 

format(res, "%Y/%d/%m")
# 43469.494444444441   12/31/2019 1:41 PM 12/01/2019  16:00:00 
#       "2019/04/01"         "2019/31/12"         "2019/01/12" 

To lookup the codes type ?strptime.

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389155

Here is one way to do this in two-step. Change excel dates separately and all other dates differently. If you have some more formats of dates that can be added in parse_date_time.

temp <- lubridate::parse_date_time(date, c('mdY IMp', 'mdY HMS'))
temp[is.na(temp)] <- as.Date(as.numeric(date[is.na(temp)]), origin = "1899-12-30")

temp
#[1] "2019-01-04 11:51:59 UTC" "2019-12-31 13:41:00 UTC" "2019-12-01 16:00:00 UTC"
as.Date(temp)
#[1] "2019-01-04" "2019-12-31" "2019-12-01"

Upvotes: 5

Related Questions