DeltaIV
DeltaIV

Reputation: 5646

convert data frame column containing different date formats to Date object

I have a data frame column of class character containing different date formats:

foo=data.frame(Date=c("29-Jan-16","29-Jan-16","2/5/2016","2/5/2016"),stringsAsFactors = F)

I would like to convert the column Date to a vector of class Date objects. I can correctly parse each format separately:

> as.Date( foo$Date, format = "%d-%b-%y" )
[1] "2016-01-29" "2016-01-29" NA           NA          
> as.Date( foo$Date, format = "%m/%d/%Y" )
[1] NA           NA           "2016-02-05" "2016-02-05"

So I thought to join the two parsing instructions with ifelse and grepl. Note that grepl correctly identifies the rows where the first format is used

> grepl("-",foo$Date)
[1]  TRUE  TRUE FALSE FALSE

However, the complet instruction doesn't work:

bar = foo
bar$Date=ifelse(grepl("-",foo$Date),
                      as.Date( foo$Date, format = "%d-%b-%y" ),
                      as.Date( foo$Date, format = "%m/%d/%Y" ))

> bar
   Date
1 16829
2 16829
3 16836
4 16836

Questions:

  1. Can you help me understand what's happening?
  2. Even if I manage to fix my solution with your help, I learn more about R and that's great, but the solution remains suboptimal. The reason is that the "brilliant" person who populates the dataframe may choose to use even more date formats (it has already happened before, and it will likely happen again). I will then have to nest more ifelse, and create more complex regexps. The code will soon become nasty and unreadable. Isn't there a way to have R automatically find the right data format for each element of foo$Date?

Upvotes: 3

Views: 235

Answers (1)

akrun
akrun

Reputation: 887511

It would be easier to work with lubridate. Assuming that the format of the 'Date' is in the order day, month, year, we can use dmy.

library(lubridate)
dmy(foo$Date)
#[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-05-02 UTC" "2016-05-02 UTC"

In case there are other variations in the order, we can also use guess_formats with parse_date_time.

 with(foo, parse_date_time(Date, 
         orders=guess_formats(Date, c('dby', 'mdy'))))
 #[1] "2016-01-29 UTC" "2016-01-29 UTC" "2016-02-05 UTC" "2016-02-05 UTC"

Regarding the use of ifelse in the OP's code, the output we get in the numeric class can be converted back to Date class

v1 <- ifelse(grepl("-",foo$Date),
                  as.Date( foo$Date, format = "%d-%b-%y" ),
                  as.Date( foo$Date, format = "%m/%d/%Y" ))

as.Date(v1, origin='1970-01-01')
#[1] "2016-01-29" "2016-01-29" "2016-02-05" "2016-02-05"

Upvotes: 3

Related Questions