Reputation: 3
I'm trying to parse multiple date formats based on their position in a vector of dates. At some the data switched the format it used from y/m/d to y/d/m. This is annoying for dates like 2010/07/03
where specifying the order in lubridate .
This is an example of dates
datevec <- c("2011/07/01", "2011/07/02", "2011/07/03", "2011/02/07" )
The dates are set up so before a certain row the dates are one format and after another row the dates are another format, so I'm trying to provide an index to the function
when I tried to parse them using this plus lubridate
it only returned 3 dates.
lapply(datevec, function(x, i) ifelse( x[i] <4, parse_date_time(x, "%Y-%m-%d"), parse_date_time(x,"%Y-%d-%m" )) )
Upvotes: 0
Views: 310
Reputation: 269441
1) If we changed the ifelse in the question to a plain if then the basic idea in the question works with appropriate modifications. Note that it gives a list L so assuming we really want a vector we add the last line of code.
f <- function(x, i) if (i < 4)
parse_date_time(x, "ymd") else parse_date_time(x, "ydm")
L <- Map(f, datevec, seq_along(datevec), USE.NAMES = FALSE)
do.call("c", L)
## [1] "2011-07-01 UTC" "2011-07-02 UTC" "2011-07-03 UTC" "2011-02-07 UTC"
2) Use the ifelse on the format part rather than on the date part and use as.Date instead of parse_date_time:
ix <- seq_along(datevec)
as.Date(datevec, ifelse(ix < 4, "%Y/%m/%d", "%Y/%d/%m"))
## [1] "2011-07-01" "2011-07-02" "2011-07-03" "2011-07-02"
3) Convert the first 3 using ymd and the rest using ydm and then concatenate.
c(ymd(head(datevec, 3)), ydm(tail(datevec, -3)))
## [1] "2011-07-01" "2011-07-02" "2011-07-03" "2011-07-02"
4) or with only base R:
c(as.Date(head(datevec, 3)), as.Date(tail(datevec, -3), "%Y/%d/%m"))
## [1] "2011-07-01" "2011-07-02" "2011-07-03" "2011-07-02"
5) Another approach is to convert the later dates using string manipulation so that all the dates are in the same format and then use as.Date or ymd:
ix <- seq_along(datevec)
swap <- sub("(..)/(..)$", "\\2/\\1", datevec)
as.Date(ifelse(ix < 4, datevec, swap))
## [1] "2011-07-01" "2011-07-02" "2011-07-03" "2011-07-02"
6) The above codes return Date class, which is more appropriate for dates without times but if for some reason you really need POSIXct use as.POSIXct on the above or else use parse_date_time like this:
c(parse_date_time(head(datevec, 3), "ymd"), parse_date_time(tail(datevec, -3), "ydm"))
## [1] "2011-07-01 UTC" "2011-07-02 UTC" "2011-07-03 UTC" "2011-07-02 UTC"
Upvotes: 2