Reputation: 792
I have a xlsx file with inconcisitent date format:
I load the file into r using read_xlsx()
. How can I change and read the entire file in (ex. 2019-12-28) form?
Here's my code:
file=file.choose()
date=read_xlsx(file)
dim(date)
[1] 37 1
date[1,1]
43827 #this should corrspond to 2019-12-28
I tried as.Date(43827, origin = "2019-12-28")
, but I got [1] "2139-12-26"
as result.
Edit for comment.
> dput(date)
structure(list(Date = c("43827", "43823", "43821", "43818", "43816",
"43813", "43811", "43809", "43806", "43804", "28/2/2019", "26/2/2019",
"23/2/2019", "21/2/2019", "19/2/2019", "16/2/2019", "14/2/2019",
"43618", "43526", "31/1/2019", "29/1/2019", "43262", "43170",
"43111", "30/10/2018", "27/10/2018", "25/10/2018", "23/10/2018",
"42593", "42501", "42440", "42380", "29/10/2016", "27/10/2016",
"25/10/2016", "22/10/2016", "20/10/2016")), row.names = c(NA,
-37L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 0
Views: 884
Reputation: 1986
The origin is wrong, in Excel is December 30, 1899.
# Coerce all Excel origin numerics into date looking strings ...
date$Date[!grepl("/", date$Date)] <- format(as.Date(as.numeric(date$Date[!grepl("/", date$Date)]) , origin = "1899-12-30"), "%d/%m/%Y")
# Convert all to dates ...
date$Date <- as.Date(date$Date, "%d/%m/%Y")
date$Date
# [1] "2019-12-28" "2019-12-24" "2019-12-22" "2019-12-19" "2019-12-17" "2019-12-14" "2019-12-12" "2019-12-10"
# [9] "2019-12-07" "2019-12-05" "2019-02-28" "2019-02-26" "2019-02-23" "2019-02-21" "2019-02-19" "2019-02-16"
# [17] "2019-02-14" "2019-06-02" "2019-03-02" "2019-01-31" "2019-01-29" "2018-06-11" "2018-03-11" "2018-01-11"
# [25] "2018-10-30" "2018-10-27" "2018-10-25" "2018-10-23" "2016-08-11" "2016-05-11" "2016-03-11" "2016-01-11"
# [33] "2016-10-29" "2016-10-27" "2016-10-25" "2016-10-22" "2016-10-20"
Data:
date <- structure(list(Date = c("43827", "43823", "43821", "43818", "43816",
"43813", "43811", "43809", "43806", "43804", "28/2/2019", "26/2/2019",
"23/2/2019", "21/2/2019", "19/2/2019", "16/2/2019", "14/2/2019",
"43618", "43526", "31/1/2019", "29/1/2019", "43262", "43170",
"43111", "30/10/2018", "27/10/2018", "25/10/2018", "23/10/2018",
"42593", "42501", "42440", "42380", "29/10/2016", "27/10/2016",
"25/10/2016", "22/10/2016", "20/10/2016")), row.names = c(NA,
-37L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 0
Reputation: 389325
Since you have dates with mixed format, you can try lubridate::parse_date_time
lubridate::parse_date_time(x, c('Ymd', 'dmY'))
#[1] "2019-12-28 UTC" "2019-02-28 UTC" "2019-10-25 UTC" "2019-06-02 UTC"
Or anytime::anydate
library(anytime)
addFormats(c('%Y-%m-%d', '%d/%m/%Y'))
anytime::anydate(x)
Replace x
with column name date$Date
. You can include all the possible formats your date can take in both the approach.
data
x <- c('2019-12-28', '28/2/2019', '25/10/2019', '2019-06-02')
Upvotes: 2
Reputation: 670
I made an example Excel file with 4 dates from your image. This works because I have the dates read-in as text using the col_types='text
argument to read_xlsx
library(readxl)
dates <- read_xlsx(path = 'put_your_filepath_here', col_types = 'text')
dates$Date #example using 4 of your dates with 2 different formats
#[1] "2019-12-28" "28/2/2019" "2019-06-02" "20/10/2016"
dates$Date[grepl('/', dates$Date)] #this is how you select the relevant rows to modify their format
#[1] "28/2/2019" "20/10/2016"
dates$Date[grepl('/', dates$Date)] <- sapply(dates$Date[grepl('/', dates$Date)], function(x) {
paste0(unlist(strsplit(x, '/'))[c(3,2,1)], collapse = '-')
}, USE.NAMES = FALSE) #sapply is necessary to do modification
#because strsplit is not a vectorized function
dates$Date
#[1] "2019-12-28" "2019-2-28" "2019-06-02" "2016-10-20" #still in character mode
dates$Date <- as.Date(dates$Date, format="%Y-%m-%d") #convert to date mode
dates$Date
#[1]"2019-12-28" "2019-02-28" "2019-06-02" "2016-10-20" #voila!
Hope this works for you!
Upvotes: 0