Economist
Economist

Reputation: 183

Converting a date column from Excel/CSV to R gives the wrong date

I've received a CSV-file that I want to analyze in R but I'm facing an issue with a date column that I've never encountered before.

When opening the file in Excel a given date is displayed as 22.12.2020 00:00 in the cell and as 22.12.2020 00:00:00 in the formula bar. When read into R using dplyr::read_csv2 it's read in as "22.12.2020 00:00" with class character. When I try converting the column to datetime using lubridate::as_date or lubridate::as_datetime I get 2022-12-20 and 2022-12-20 20:00:00 respectively. I'm guessing it must be due to the seconds missing from the initial string. I've tried adding ":00" to the end of the string before doing the conversion but that only leads to NA. Can anyone give me a hint on how to solve this?

test4 <- structure(list(ORDER_STATUS_DATE = 20201222, DAY = "22.12.2020 00:00"), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame"))

test4 %>% 
  mutate(DAY = as_datetime(DAY))

# Returns 2022-12-20 20:00:00 but should ideally have returned 2022-12-22 00:00:00

test4 %>% 
  mutate(DAY = as_date(DAY))

# Returns 2022-12-20

test4 %>% 
  mutate(DAY = DAY %>% paste0(":00:00"))
  
# Returns 22.12.2020 00:00:00:00 so converting to date or datetime leads to NAs

Upvotes: 0

Views: 753

Answers (2)

dario
dario

Reputation: 6485

You don't necessarily need lubridate here (although it is a great library):

as.POSIXct(test4$DAY, tz = "UTC",  format="%d.%m.%Y %H:%M")

Returns:

"2020-12-22 UTC"

Upvotes: 1

pseudospin
pseudospin

Reputation: 2767

If the date formats are ambiguous you need to specify what it is - lubridate is great for this.

lubridate::dmy_hm("22.12.2020 00:00")
#> [1] "2020-12-22 UTC"

Upvotes: 1

Related Questions