Reputation: 1352
Got a lot of Excel files, all containing the same columns. One of the columns is called "DateTime". The problem is, that this column contain different formats, e.g:
2/12/2013 11:59:45 PM
20/2/2013 14:36:58
As you can see, the first example contains a PM at the end. Here is my problem:
My code to import all the Excel files in the specific folder:
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)
df <- bind_rows(df.list, .id = "id")
But, after I run the second line (df.list <- lapply(file.list, read_excel)
) I got a warnings like (Expecting numeric, got a date).
I tried different things, such as assigning col_types = "text"
(I hoped that I could import all the data as character, and afterwards change remove the PM at the end and change the characters back to as.Date
again). But unfortunately the results was a lot of NA
's.
Another thing that I tried was to import the the file as numeric (but again, received the same message "Expecting numeric, got a date).
So how can I tackle this problem? I can do it by changing the columns type in Excel, but that is not what I want to do (I prefer to do everything in R).
Edit
Some data as requested (after I imported the data into R with as "character" with the lubridate as_date function, as suggested).
Num Date
1 41277.000601851854
2 41277.004710648151
3 41277.004826388889
4 41277.007372685184
5 41277.007789351854
6 41277.007870370369
7 41277.010335648149
8 41277.010868055557
9 41277.013842592591
10 41277.014548611114
Upvotes: 1
Views: 696
Reputation: 500
First read in all data as char strings, it will be easy to clean afterwards.
Then I think the lubridate
package could help you here. I don't know how many files/formats you have but the example given can be read by the same lubridate::dmy_hms()
function:
# >datim <- c("20/2/2013 14:36:58", "2/12/2013 11:59:45 PM")
# >dmy_hms(datim)
# [1] "2013-02-20 14:36:58 UTC" "2013-12-02 23:59:45 UTC"
There is a function for each ordering of time units. mdy_hm()
for example. If you have more variation in your time formats you would use stringr::str_detect(your_date_variable, "date_format_regex")
to identify lines with each type of format before applying the appropriate dmy_hm() function.
Upvotes: 1
Reputation: 409
lubridate
has the excellent parse_date_time
function, you can specify the orders of day/month/year and also 'truncate', e.g. allow for some formats to have missing numbers.
library(lubridate)
time <- c("2/12/2013 11:59:45 PM", "20/2/2013 14:36:58")
parse_date_time(time, orders = c('dmyHMS') )
[1] "2013-12-02 11:59:45 UTC" "2013-02-20 14:36:58 UTC"
Upvotes: 2