R overflow
R overflow

Reputation: 1352

Import Excel files with different Date types

Got a lot of Excel files, all containing the same columns. One of the columns is called "DateTime". The problem is, that this column contain different formats, e.g:

2/12/2013  11:59:45 PM
20/2/2013 14:36:58

As you can see, the first example contains a PM at the end. Here is my problem:

My code to import all the Excel files in the specific folder:

file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)
df <- bind_rows(df.list, .id = "id")

But, after I run the second line (df.list <- lapply(file.list, read_excel)) I got a warnings like (Expecting numeric, got a date).

I tried different things, such as assigning col_types = "text" (I hoped that I could import all the data as character, and afterwards change remove the PM at the end and change the characters back to as.Date again). But unfortunately the results was a lot of NA's.

Another thing that I tried was to import the the file as numeric (but again, received the same message "Expecting numeric, got a date).

So how can I tackle this problem? I can do it by changing the columns type in Excel, but that is not what I want to do (I prefer to do everything in R).

Edit

Some data as requested (after I imported the data into R with as "character" with the lubridate as_date function, as suggested).

Num Date
1  41277.000601851854
2  41277.004710648151
3  41277.004826388889
4  41277.007372685184
5  41277.007789351854
6  41277.007870370369
7  41277.010335648149
8  41277.010868055557
9  41277.013842592591
10 41277.014548611114

Upvotes: 1

Views: 696

Answers (2)

bmrn
bmrn

Reputation: 500

First read in all data as char strings, it will be easy to clean afterwards.

Then I think the lubridate package could help you here. I don't know how many files/formats you have but the example given can be read by the same lubridate::dmy_hms() function:

# >datim <- c("20/2/2013 14:36:58", "2/12/2013  11:59:45 PM")
# >dmy_hms(datim)
# [1] "2013-02-20 14:36:58 UTC" "2013-12-02 23:59:45 UTC"

There is a function for each ordering of time units. mdy_hm() for example. If you have more variation in your time formats you would use stringr::str_detect(your_date_variable, "date_format_regex") to identify lines with each type of format before applying the appropriate dmy_hm() function.

Upvotes: 1

sorearm
sorearm

Reputation: 409

lubridate has the excellent parse_date_time function, you can specify the orders of day/month/year and also 'truncate', e.g. allow for some formats to have missing numbers.

library(lubridate)

time <- c("2/12/2013 11:59:45 PM", "20/2/2013 14:36:58")
parse_date_time(time, orders = c('dmyHMS') )

[1] "2013-12-02 11:59:45 UTC" "2013-02-20 14:36:58 UTC"

Upvotes: 2

Related Questions