Reputation:
While converting a dataframe to xts
I realized that there is something wrong with the formatter. Here's an example dataframe:
effective_date price
"1990-01-01" "100"
"1990-01-02 00:05:00" "200"
This is example output from a package that I use.
Converting this to xts
is straight-forward
xts(df["price"], order_by=as.POSIXct(df["effective_date"], format="%Y-%m-%d %H:%M:%S")
However this errors out, saying NAs
can't be in row names, and the result is:
<NA> 100
1990-01-02 00:05:00 200
Obviously xts
can't figure out what to do with the weird date there (midnight) and it won't coerce it.
If I add tz="UTC"
to as.POSIXct
it doesn't work. Additionally, as.POSIXlt
doesnt change anything here either.
What can I do to coerce that midnight date to the correct format?
Upvotes: 2
Views: 3546
Reputation: 269854
1) To get the "POSIXct"
datetime vector try converting each datetime to "POSIXct"
separately and then concatenate them together:
do.call("c", lapply(df$effective_date, as.POSIXct))
2) Another base solution that is even shorter and is also substantially faster is the following which relies on the fact that as.POSIXct
will ignore junk at the end.
as.POSIXct(paste(df$effective, "00:00:00"))
Upvotes: 7
Reputation: 368389
Two issues:
1) You cannot parse a date alone as POSIXct with a given format:
R> as.POSIXct(c("2017-01-02", "2017-01-03 04:05:06"), format="%Y-%m-%d %H:%M:%S")
[1] NA "2017-01-03 04:05:06 CST"
R>
2) You can however use the anytime()
function to do it:
R> anytime::anytime(c("2017-01-02", "2017-01-03 04:05:06"))
[1] "2017-01-02 00:00:00 CST" "2017-01-03 04:05:06 CST"
R>
Once you have a POSIXct
, forming the xts
is easy.
Also note that you have typos: you need a comma before the column indicator: df[, "price"]
.
Edit: Getting a little tired of @42's comment about Gabor's (fine) solution "dominating" this one, so here's minimal benchmark:
R> library(microbenchmark)
R> v <- c("2017-01-02", "2017-01-03 04:05:06")
R> library(anytime)
R> print(microbenchmark(anytime(v), do.call("c", lapply(v, as.POSIXct))), digits=3)
Unit: microseconds
expr min lq mean median uq max neval cld
anytime(v) 33.6 36.8 42.1 45.6 46.6 80.7 100 a
do.call("c", lapply(v, as.POSIXct)) 571.5 579.1 586.4 586.8 589.5 695.7 100 b
R>
so in short "not really". It is using only R Base, which is a plus, put it is a) harder read and understand, b) more limited as it deals with exactly one format (in ISO style) and c) it is about thirteen times slower.
Upvotes: 8
Reputation: 43354
Most of lubridate
's parsing functions have a truncated
parameter that takes a number indicating the number of elements that can be missing from the end. Missing elements will be replaced by zero.
Example with the data at hand:
lubridate::ymd_hms(c("2017-01-02", "2017-01-03 04:05:06"), truncated = 3)
## [1] "2017-01-02 00:00:00 UTC" "2017-01-03 04:05:06 UTC"
Upvotes: 4
Reputation: 4378
Assuming you want the timestamps, preprocess with something like:
temp <- c("1990-01-01", "1990-01-02 00:05:00")
# match a date string at the end of string (indicated by $). Replace
# with the full string (indicated by \\1 and 00:00:00
temp2 <- gsub("(\\d{4}\\-\\d{2}\\-\\d{2}$)", "\\1 00:00:00", temp)
# [1] "1990-01-01 00:00:00" "1990-01-02 00:05:00"
Upvotes: 1