user6467981
user6467981

Reputation:

R drops hours, minutes, and seconds from date

While converting a dataframe to xts I realized that there is something wrong with the formatter. Here's an example dataframe:

effective_date         price
"1990-01-01"  "100"
"1990-01-02 00:05:00"  "200"

This is example output from a package that I use.

Converting this to xts is straight-forward

xts(df["price"], order_by=as.POSIXct(df["effective_date"], format="%Y-%m-%d %H:%M:%S")

However this errors out, saying NAs can't be in row names, and the result is:

<NA>       100
1990-01-02 00:05:00  200

Obviously xts can't figure out what to do with the weird date there (midnight) and it won't coerce it.

If I add tz="UTC" to as.POSIXct it doesn't work. Additionally, as.POSIXlt doesnt change anything here either.

What can I do to coerce that midnight date to the correct format?

Upvotes: 2

Views: 3546

Answers (4)

G. Grothendieck
G. Grothendieck

Reputation: 269854

1) To get the "POSIXct" datetime vector try converting each datetime to "POSIXct" separately and then concatenate them together:

do.call("c", lapply(df$effective_date, as.POSIXct))

2) Another base solution that is even shorter and is also substantially faster is the following which relies on the fact that as.POSIXct will ignore junk at the end.

as.POSIXct(paste(df$effective, "00:00:00"))

Upvotes: 7

Dirk is no longer here
Dirk is no longer here

Reputation: 368389

Two issues:

1) You cannot parse a date alone as POSIXct with a given format:

R> as.POSIXct(c("2017-01-02", "2017-01-03 04:05:06"), format="%Y-%m-%d %H:%M:%S")
[1] NA                        "2017-01-03 04:05:06 CST"
R>

2) You can however use the anytime() function to do it:

R> anytime::anytime(c("2017-01-02", "2017-01-03 04:05:06"))
[1] "2017-01-02 00:00:00 CST" "2017-01-03 04:05:06 CST"
R> 

Once you have a POSIXct, forming the xts is easy.

Also note that you have typos: you need a comma before the column indicator: df[, "price"].

Edit: Getting a little tired of @42's comment about Gabor's (fine) solution "dominating" this one, so here's minimal benchmark:

R> library(microbenchmark)
R> v <- c("2017-01-02", "2017-01-03 04:05:06")
R> library(anytime)
R> print(microbenchmark(anytime(v), do.call("c", lapply(v, as.POSIXct))), digits=3)
Unit: microseconds
                                expr   min    lq  mean median    uq   max neval cld
                          anytime(v)  33.6  36.8  42.1   45.6  46.6  80.7   100  a 
 do.call("c", lapply(v, as.POSIXct)) 571.5 579.1 586.4  586.8 589.5 695.7   100   b
R> 

so in short "not really". It is using only R Base, which is a plus, put it is a) harder read and understand, b) more limited as it deals with exactly one format (in ISO style) and c) it is about thirteen times slower.

Upvotes: 8

alistaire
alistaire

Reputation: 43354

Most of lubridate's parsing functions have a truncated parameter that takes a number indicating the number of elements that can be missing from the end. Missing elements will be replaced by zero.

Example with the data at hand:

lubridate::ymd_hms(c("2017-01-02", "2017-01-03 04:05:06"), truncated = 3)
## [1] "2017-01-02 00:00:00 UTC" "2017-01-03 04:05:06 UTC"

Upvotes: 4

Andrew Lavers
Andrew Lavers

Reputation: 4378

Assuming you want the timestamps, preprocess with something like:

temp <- c("1990-01-01", "1990-01-02 00:05:00")

# match a date string at the end of string (indicated by $). Replace
# with the full string (indicated by \\1 and 00:00:00
temp2 <- gsub("(\\d{4}\\-\\d{2}\\-\\d{2}$)", "\\1 00:00:00", temp)

# [1] "1990-01-01 00:00:00" "1990-01-02 00:05:00"

Upvotes: 1

Related Questions