user8959427
user8959427

Reputation: 2067

rgdax (coinbase) data not collecting data as expected

I am trying to use the rgdax package for R to download some historical prices.

I set my API key etc and I try to load in the past 24 hours:

start <- strftime(Sys.time(), "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
end <- strftime(Sys.time(), "%Y-%m-%dT%H:%M:%SZ", tz = "UTC") 

df <- public_candles(product_id = "ETH-EUR", granularity = 300, start = start, end = end)

However this loads in "too much" data.

I want the last 24 hours of data but it loads in a Little more than this.

Head()
                   time    low   high   open  close volume
329 2019-01-22 16:25:00 104.09 104.12 104.09 104.09  16.03
328 2019-01-22 16:30:00 104.11 104.14 104.12 104.13  21.61
327 2019-01-22 16:35:00 103.88 104.12 104.10 103.97 161.35
326 2019-01-22 16:40:00 103.96 103.97 103.96 103.97  26.59
325 2019-01-22 16:45:00 103.97 104.20 103.97 104.19  48.57
324 2019-01-22 16:50:00 104.19 104.36 104.20 104.36  45.40

Tail()

                 time    low   high   open  close volume
6 2019-01-23 21:05:00 101.34 101.64 101.64 101.41  42.93
5 2019-01-23 21:10:00 101.42 101.58 101.42 101.54  24.03
4 2019-01-23 21:15:00 101.54 101.64 101.54 101.64  37.73
3 2019-01-23 21:20:00 101.60 101.68 101.60 101.61  35.97
2 2019-01-23 21:25:00 101.59 101.66 101.66 101.59  30.99
1 2019-01-23 21:30:00 101.59 101.62 101.60 101.59  12.91

I want the data to start 24 hours earlier than the Sys.time() - i.e. 2019-01-22 21:30:00 and not 2019-01-22 16:50:00 or 24 hours earlier than the last observation in the tail()

When I try the following (starting 86400 seconds (24 hours) before).

start <- strftime(Sys.time() - 86400, "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
end <- strftime(Sys.time(), "%Y-%m-%dT%H:%M:%SZ", tz = "UTC") 

Nothing changes.

The documentation states (page 11) that it takes:

start Optional parameter. Start time in ISO 8601
end Optional parameter. End time in ISO 8601

I have put it in this format (correct me if I am wrong)

Documentation: https://cran.r-project.org/web/packages/rgdax/rgdax.pdf

EDIT:

The following seems to filter the data so that I have just 24 hours worth od data

x <- df %>%
  tbl_time(index = time) %>%
  filter(time > Sys.time() - 86400 - 288 * 3)

head(x , 1)
tail(x, 1)

But I now only have 265 5 minute periods. There are 288 5 minute periods in 24 hours. It would still be nice to download exactly 24 hours worth of data straight from the platform.

Upvotes: 3

Views: 251

Answers (1)

niko
niko

Reputation: 5281

I believe the problem lies in the request of rgdax::public_candles (it uses curl if I'm not mistaking).

Prelude

Setup

Here are a few variables that will be used in the following

# your variables
start <- strftime(Sys.time() - 86400, "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
end <- strftime(Sys.time(), "%Y-%m-%dT%H:%M:%SZ", tz = "UTC") 
product_id = "ETH-EUR"
granularity = 300
# request url
req.url <- paste0("https://api.pro.coinbase.com/products/", product_id, "/candles")

Problem

Now, the issue indicated I was able to reproduce using rgdax::public_candles but also by simply using jsonlite and accessing the data via url directly

# fetching the data ourselves
res <- jsonlite::fromJSON(req.url)
res <- as.data.frame(res)
# checking the dates
res[['V1']] <- as.POSIXct(.subset2(res,1L), origin="1970-01-01")
c(min(res$V1),max(res$V1))
# [1] "2019-01-23 18:54:00 CET" "2019-01-24 00:42:00 CET"           # Problem still here

Solution

Here is a solution where we basically formulate the GET request ourselves, making sure to specify the query parameters

# fetching the data ourselves - the return
res <- httr::GET(url = req.url, 
                 query = list(start = start, end = end, 
                              granularity = granularity))
res <- as.data.frame(t(matrix(unlist(httr::content(res)), nrow = 6)))
res[['V1']] <- as.POSIXct(.subset2(res,1L), origin="1970-01-01")
c(min(res$V1),max(res$V1)) 
# [1] "2019-01-23 00:40:00 CET" "2019-01-24 00:40:00 CET"           # Problem solved

Comment

Simply executing rgdax::public_candles in the consoles gives us some insight into where the issue might lie. The way I see it, the problem should be located in that line

content <- parse_response(path = req.url, query = list(start = start, 
                          end = end, granularity = granularity))

I do not know the function parse_response and I haven't investigated further, but it seems to be failing to feed the query parameters.

Update 1:

I checked in curl and openssl (the two packages imported in rgdax) and parse_response is not the name space, neither is it in the name space of rgdax. I suspect it is a non-exported rgdax method.

Update 2:

As suspected, parse_response is a non-exported method of rgdax. Inside the method, the line

url <- modify_url(api.url, path = path, query = query)

should handle the query parameters I suppose. However, the method modify_url is nowhere to be found. Maybe resulting in yielding the standard query parameters.

Upvotes: 5

Related Questions