Reputation: 5646
I have a data frame column of class character
containing different date formats:
foo=data.frame(Date=c("29-Jan-16","29-Jan-16","2/5/2016","2/5/2016"),stringsAsFactors = F)
I would like to convert the column Date
to a vector of class Date
objects. I can correctly parse each format separately:
> as.Date( foo$Date, format = "%d-%b-%y" )
[1] "2016-01-29" "2016-01-29" NA NA
> as.Date( foo$Date, format = "%m/%d/%Y" )
[1] NA NA "2016-02-05" "2016-02-05"
So I thought to join the two parsing instructions with ifelse
and grepl
. Note that grepl
correctly identifies the rows where the first format is used
> grepl("-",foo$Date)
[1] TRUE TRUE FALSE FALSE
However, the complet instruction doesn't work:
bar = foo
bar$Date=ifelse(grepl("-",foo$Date),
as.Date( foo$Date, format = "%d-%b-%y" ),
as.Date( foo$Date, format = "%m/%d/%Y" ))
> bar
Date
1 16829
2 16829
3 16836
4 16836
Questions:
ifelse
, and create more complex regexps. The code will soon become nasty and unreadable. Isn't there a way to have R automatically find the right data format for each element of foo$Date
?Upvotes: 3
Views: 235
Reputation: 887511
It would be easier to work with lubridate
. Assuming that the format of the 'Date' is in the order day, month, year, we can use dmy
.
library(lubridate)
dmy(foo$Date)
#[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-05-02 UTC" "2016-05-02 UTC"
In case there are other variations in the order, we can also use guess_formats
with parse_date_time
.
with(foo, parse_date_time(Date,
orders=guess_formats(Date, c('dby', 'mdy'))))
#[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-02-05 UTC" "2016-02-05 UTC"
Regarding the use of ifelse
in the OP's code, the output we get in the numeric
class can be converted back to Date
class
v1 <- ifelse(grepl("-",foo$Date),
as.Date( foo$Date, format = "%d-%b-%y" ),
as.Date( foo$Date, format = "%m/%d/%Y" ))
as.Date(v1, origin='1970-01-01')
#[1] "2016-01-29" "2016-01-29" "2016-02-05" "2016-02-05"
Upvotes: 3