Timur
Timur

Reputation: 15

R: readxl and date format

I read in an excel file, where 1 column contains dates in different format: excel format (e.g. 43596) and text (e.g. "01.01.2020"). To convert excel format one can use as.Date(as.numeric(df$date), origin = "1899-12-30") to convert text one can use as.Date(df$date, format = "%d.%m.%Y") These work for individual values, but when I try ifelse as:

df$date <- ifelse(length(df$date)==5, 
            as.Date(as.numeric(df$date), origin = "1899-12-30"),
            as.Date(df$date, format = "%d.%m.%Y"))

or a for loop:

  for (i in length(x)) {
  if(nchar(x[i])==5) {
    y[i] <- as.Date(as.numeric(x[i]), origin = "1899-12-30")
  } else {x[i] <- as.Date(x[i], , format = "%d.%m.%Y"))}
  } print(x)

It does not work because of:

"character string is not in a standard unambiguous format"

Maybe you could advice a better solution to convert/ replace different date formats in the appropriate one?

Upvotes: 0

Views: 449

Answers (3)

Andre Wildberg
Andre Wildberg

Reputation: 19191

A tidyverse solution using rowwise

library(dplyr)
library(lubridate)

df %>% 
  rowwise() %>% 
  mutate(date_new=as.Date(ifelse(grepl("\\.",date),
    as.character(dmy(date)),
    as.character(as.Date(as.numeric(date), origin="1899-12-30"))))) %>% 
  ungroup()
# A tibble: 6 × 3
  contract date       date_new  
     <dbl> <chr>      <date>    
1   231429 43596      2019-05-11
2   231437 07.01.2020 2020-01-07
3   231449 01.01.2021 2021-01-01
4   231459 03.03.2020 2020-03-03
5   231463 44131      2020-10-27
6   231466 40619      2011-03-17

Upvotes: 0

Waldi
Waldi

Reputation: 41260

You could use sapply to apply ifelse to each value:

df$date <- as.Date(sapply(df$date,function(date) ifelse(nchar(date)==5, 
                                     as.Date(as.numeric(date), origin = "1899-12-30"),
                                     as.Date(date, format = "%d.%m.%Y"))),
                   origin="1970-01-01")
df

# A tibble: 6 x 2
  contract date      
     <dbl> <date>    
1   231429 2019-05-11
2   231437 2020-01-07
3   231449 2021-01-01
4   231459 2020-03-03
5   231463 2020-10-27
6   231466 2011-03-17

Upvotes: 0

AugtPelle
AugtPelle

Reputation: 549

I have 2 solutions for it.

  1. Changing the code, which I don't like because you are depending on xlsx date formats:
> df <- tibble(date = c("01.01.2020","43596"))
> 
> df$date <- as.Date(ifelse(nchar(df$date)==5, 
+                           as.Date(as.numeric(df$date), origin = "1899-12-30"),
+                           as.Date(df$date, format = "%d.%m.%Y")), origin = "1970-01-01")
Warning message:
In as.Date(as.numeric(df$date), origin = "1899-12-30") :
  NAs introducidos por coerción
> 
> df$date
[1] "2020-01-01" "2019-05-11"
>
  1. Save the document as CSV and use read_csv() function from readr package. That solves everything !!!!

Upvotes: 1

Related Questions