Math Avengers
Math Avengers

Reputation: 792

Change inconsistent date format in R

I have a xlsx file with inconcisitent date format:

xlsx file with different date format

I load the file into r using read_xlsx(). How can I change and read the entire file in (ex. 2019-12-28) form? Here's my code:

file=file.choose()
date=read_xlsx(file)
dim(date)
[1] 37  1
date[1,1]
43827   #this should corrspond to 2019-12-28

I tried as.Date(43827, origin = "2019-12-28"), but I got [1] "2139-12-26" as result.

Edit for comment.

> dput(date)
structure(list(Date = c("43827", "43823", "43821", "43818", "43816", 
"43813", "43811", "43809", "43806", "43804", "28/2/2019", "26/2/2019", 
"23/2/2019", "21/2/2019", "19/2/2019", "16/2/2019", "14/2/2019", 
"43618", "43526", "31/1/2019", "29/1/2019", "43262", "43170", 
"43111", "30/10/2018", "27/10/2018", "25/10/2018", "23/10/2018", 
"42593", "42501", "42440", "42380", "29/10/2016", "27/10/2016", 
"25/10/2016", "22/10/2016", "20/10/2016")), row.names = c(NA, 
-37L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 0

Views: 884

Answers (3)

Khaynes
Khaynes

Reputation: 1986

The origin is wrong, in Excel is December 30, 1899.

# Coerce all Excel origin numerics into date looking strings ...
date$Date[!grepl("/", date$Date)] <- format(as.Date(as.numeric(date$Date[!grepl("/", date$Date)]) , origin = "1899-12-30"), "%d/%m/%Y")

# Convert all to dates ...
date$Date <- as.Date(date$Date, "%d/%m/%Y")

date$Date
#  [1] "2019-12-28" "2019-12-24" "2019-12-22" "2019-12-19" "2019-12-17" "2019-12-14" "2019-12-12" "2019-12-10"     
#  [9] "2019-12-07" "2019-12-05" "2019-02-28" "2019-02-26" "2019-02-23" "2019-02-21" "2019-02-19" "2019-02-16"     
# [17] "2019-02-14" "2019-06-02" "2019-03-02" "2019-01-31" "2019-01-29" "2018-06-11" "2018-03-11" "2018-01-11"     
# [25] "2018-10-30" "2018-10-27" "2018-10-25" "2018-10-23" "2016-08-11" "2016-05-11" "2016-03-11" "2016-01-11"     
# [33] "2016-10-29" "2016-10-27" "2016-10-25" "2016-10-22" "2016-10-20"

Data:

date <- structure(list(Date = c("43827", "43823", "43821", "43818", "43816", 
"43813", "43811", "43809", "43806", "43804", "28/2/2019", "26/2/2019", 
"23/2/2019", "21/2/2019", "19/2/2019", "16/2/2019", "14/2/2019", 
"43618", "43526", "31/1/2019", "29/1/2019", "43262", "43170", 
"43111", "30/10/2018", "27/10/2018", "25/10/2018", "23/10/2018", 
"42593", "42501", "42440", "42380", "29/10/2016", "27/10/2016", 
"25/10/2016", "22/10/2016", "20/10/2016")), row.names = c(NA, 
-37L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389325

Since you have dates with mixed format, you can try lubridate::parse_date_time

lubridate::parse_date_time(x, c('Ymd', 'dmY'))
#[1] "2019-12-28 UTC" "2019-02-28 UTC" "2019-10-25 UTC" "2019-06-02 UTC"

Or anytime::anydate

library(anytime)
addFormats(c('%Y-%m-%d', '%d/%m/%Y'))
anytime::anydate(x)

Replace x with column name date$Date. You can include all the possible formats your date can take in both the approach.

data

x <- c('2019-12-28', '28/2/2019', '25/10/2019', '2019-06-02')

Upvotes: 2

ThetaFC
ThetaFC

Reputation: 670

I made an example Excel file with 4 dates from your image. This works because I have the dates read-in as text using the col_types='text argument to read_xlsx

library(readxl)
dates <- read_xlsx(path = 'put_your_filepath_here', col_types = 'text')
dates$Date #example using 4 of your dates with 2 different formats
#[1] "2019-12-28" "28/2/2019"  "2019-06-02" "20/10/2016"
dates$Date[grepl('/', dates$Date)] #this is how you select the relevant rows to modify their format
#[1] "28/2/2019"  "20/10/2016"
dates$Date[grepl('/', dates$Date)] <- sapply(dates$Date[grepl('/', dates$Date)], function(x) {
  paste0(unlist(strsplit(x, '/'))[c(3,2,1)], collapse = '-')
}, USE.NAMES = FALSE) #sapply is necessary to do modification 
#because strsplit is not a vectorized function
dates$Date
#[1] "2019-12-28" "2019-2-28"  "2019-06-02" "2016-10-20" #still in character mode
dates$Date <- as.Date(dates$Date, format="%Y-%m-%d") #convert to date mode
dates$Date
#[1]"2019-12-28" "2019-02-28" "2019-06-02" "2016-10-20" #voila!

Hope this works for you!

Upvotes: 0

Related Questions