Reputation: 1002
I have found a very annoying problem that I want to share with the community. This is a question that I have found an acceptable solution for (detailed below), but I now have several follow-up questions. My knowledge of time stamps and POSIX variables is limited, particularity how plyr, dplyr, and readr handle these.
When working with POSIX variables (aka, date and time stamps), I found that write_csv from readr changed these variables into UTC time.
I am downloading data from an API and preserving the time stamp. Each time I grab data, I bind it to an existing file and save the file. My timezone is MDT, and I am requesting data using MDT time, which I am then trying to bind to a file in UTC time, and the times don't match...it gets messy and frustrating. In essence the beautiful time stamp database I am trying to create is turning into a pile of garbage.
To remedy this problem, I converted the POSIX time column to character column using:
df.time <- as.character(df.time)
This allowed me to save the files in a time zone consistent with the time stamps being returned to me by the API.
This leads me to the following series of questions:
EDIT: I have included some example data of what I am talking about:
> df1 <- as.data.frame(fromJSON("https://api.pro.coinbase.com/products/BTC-USD/candles?start=2018-07-23&12:57:00?stop=2018-07-23&19:34:58granularity=300"))
> colnames(df1) <- c("time", "low", "high", "open", "close", "volume")
> df1$time <- anytime(df1$time)
> df1Sort <- df1[order(df1$time),]
> head(df1Sort, 5)
time low high open close volume
299 2018-07-23 16:13:00 7747.00 7747.01 7747.01 7747.01 9.2029168
298 2018-07-23 16:14:00 7743.17 7747.01 7747.00 7747.01 7.0205668
297 2018-07-23 16:15:00 7745.47 7745.73 7745.67 7745.73 0.9075707
296 2018-07-23 16:16:00 7745.72 7745.73 7745.72 7745.73 4.6715157
295 2018-07-23 16:17:00 7745.72 7745.73 7745.72 7745.72 2.4921921
> write_csv(df1Sort, "df1Sort.csv", col_names = TRUE)
> df2 <- read_csv("df1Sort.csv", col_names = TRUE)
Parsed with column specification:
cols(
time = col_datetime(format = ""),
low = col_double(),
high = col_double(),
open = col_double(),
close = col_double(),
volume = col_double()
)
> head(df2, 5)
# A tibble: 5 x 6
time low high open close volume
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2018-07-23 22:13:00 7747 7747 7747 7747 9.20
2 2018-07-23 22:14:00 7743 7747 7747 7747 7.02
3 2018-07-23 22:15:00 7745 7746 7746 7746 0.908
4 2018-07-23 22:16:00 7746 7746 7746 7746 4.67
5 2018-07-23 22:17:00 7746 7746 7746 7746 2.49
Upvotes: 8
Views: 5148
Reputation: 145775
"Is there a program that can join POSIX variables across time zones... without having to convert them to the same time zone first?"
Maybe? But if so, they're almost certainly just converting to UTC under the hood and just hiding it from you. I'm unaware of any thing like this in R. (data.table
being the only package I'm aware of that can join on anything other than exact equality, and it doesn't have this feature.) If I were you, I'd just convert everything to one timezone - probably UTC.
For more reading for best practices this SQL-focused answer seems very good.
"Is it possible to prevent write_csv from changing POSIX variables to UTC?"
Not built-in. The ?write_csv
documentation is pretty clear: It doesn't list any options for this and does say "POSIXct's are formatted as ISO8601."
"Is there a csv write function that doesn't change POSIX variables?"
Sure, the built-in write.csv
doesn't change to UTC (I think it uses system settings), and data.table::fwrite
offers quite a few options. If you want to control how your dates are saved, I think your best bet is to convert them to character
in whatever format you want, and then any of the writing functions should handle them just fine. You should check out the ?data.table::fwrite
documentation, it's got good info. They warn that the "write.csv"
option can be quite slow.
You should include reproducible examples with your questions. Here's one for this:
t = as.POSIXct("2018-01-01 01:30:00", tz = "Africa/Addis_Ababa")
t
# [1] "2018-01-01 01:30:00 EAT"
d = data.frame(t)
library(readr)
write_csv(d, "tz_test.csv")
system("head tz_test.csv")
# 2017-12-31T22:30:00Z
library(data.table)
fwrite(d, "tz_test_dt.csv", dateTimeAs = "write.csv")
system("head tz_test_dt.csv")
# t
# 2018-01-01 01:30:00
write.csv(d, "tz_test_base.csv")
system("head tz_test_base.csv")
# "","t"
# "1",2018-01-01 01:30:00
Upvotes: 5
Reputation: 54
It looks like you're using libraries from the tidyverse; have you had a look at the lubridate library?
The help file for as_date() may help you here convert a date-time variable to your desired timezone before you append/join your data.
For example:
> dt_utc <- ymd_hms("2010-08-03 00:50:50")
> dt_utc
[1] "2010-08-03 00:50:50 UTC"
> as_datetime(dt_utc, tz = "Australia/Melbourne")
[1] "2010-08-03 10:50:50 AEST"
Upvotes: 1