Reputation: 1040
I am trying to read an excel file into R. Among other fields, the excel file has two "date" fields, each containing both the date and time stamp in the SAME field.
Example:
StartDate 9/14/2019 10:18:59 AM
EndDate 9/18/2019 2:27:14 AM
When I tried read_excel to read in the excel file, the data frame formatted these two columns very strangely. It spat out the days (with decimals). Such as 43712.429849537039, Which I thought was days from Jan-01-1970 (the origin date that popped up when I typed lubrudate::origin).
data %<>%
mutate(StartDate = as.Date(StartDate, origin = "1970-01-01 UTC"))
So I tried converting this back using as.Date, but it converts it to the totally wrong date... (converts all the dates to the year 2089). Example, 2089-09-05.
Any help with this would be really appreciated! There must be a simpler way to directly read in a date-time column?!
Upvotes: 0
Views: 967
Reputation: 1040
It turns out that excel has a different "origin date" from R. Excels counts the days from 01-01-1900, where as R counts days from 01-01-1970.
When I used read_excel to read the file into a df, R used excels' counts of days. Which is why I got a weird date when I tried to convert to the date format using 1970. As soon as I used as.Date with excels "origin" date of 1990 (excels origin date), my dates parsed out correctly!
Upvotes: 1
Reputation: 4092
You can use the lubridate package, it is excellent:
library(tidyverse)
df <- data.frame(StartDate =c("9/14/2019 10:18:59 AM","9/14/2019 3:18:59 PM"),
EndDate= c("9/18/2019 2:27:14 AM","9/18/2019 1:27:14 PM"))
df <- df %>% mutate(StartDate = lubridate::mdy_hms(StartDate), EndDate = lubridate::mdy_hms(EndDate))
Upvotes: 0