Reputation: 183
I've received a CSV-file that I want to analyze in R but I'm facing an issue with a date column that I've never encountered before.
When opening the file in Excel a given date is displayed as 22.12.2020 00:00 in the cell and as 22.12.2020 00:00:00 in the formula bar. When read into R using dplyr::read_csv2 it's read in as "22.12.2020 00:00" with class character. When I try converting the column to datetime using lubridate::as_date or lubridate::as_datetime I get 2022-12-20 and 2022-12-20 20:00:00 respectively. I'm guessing it must be due to the seconds missing from the initial string. I've tried adding ":00" to the end of the string before doing the conversion but that only leads to NA. Can anyone give me a hint on how to solve this?
test4 <- structure(list(ORDER_STATUS_DATE = 20201222, DAY = "22.12.2020 00:00"), row.names = c(NA,
-1L), class = c("tbl_df", "tbl", "data.frame"))
test4 %>%
mutate(DAY = as_datetime(DAY))
# Returns 2022-12-20 20:00:00 but should ideally have returned 2022-12-22 00:00:00
test4 %>%
mutate(DAY = as_date(DAY))
# Returns 2022-12-20
test4 %>%
mutate(DAY = DAY %>% paste0(":00:00"))
# Returns 22.12.2020 00:00:00:00 so converting to date or datetime leads to NAs
Upvotes: 0
Views: 753
Reputation: 6485
You don't necessarily need lubridate here (although it is a great library):
as.POSIXct(test4$DAY, tz = "UTC", format="%d.%m.%Y %H:%M")
Returns:
"2020-12-22 UTC"
Upvotes: 1
Reputation: 2767
If the date formats are ambiguous you need to specify what it is - lubridate is great for this.
lubridate::dmy_hm("22.12.2020 00:00")
#> [1] "2020-12-22 UTC"
Upvotes: 1