histelheim
histelheim

Reputation: 5098

Manual API rate limiting

I am trying to write a manual rate-limiting function for the rgithub package. So far this is what I have:

library(rgithub)

pull <- function(i){
 commits <- get.pull.request.commits(owner = owner, repo = repo, id = i, ctx = get.github.context(), per_page=100)
 links <- digest_header_links(commits)
 number_of_pages <- links[2,]$page
 if (number_of_pages != 0)
   try_default(for (n in 1:number_of_pages){
    if (as.integer(commits$headers$`x-ratelimit-remaining`) < 5)
     Sys.sleep(as.integer(commits$headers$`x-ratelimit-reset`)-as.POSIXct(Sys.time()) %>% as.integer())
  else
    get.pull.request.commits(owner = owner, repo = repo, id = i, ctx = get.github.context(), per_page=100, page = n)
}, default = NULL)
else 
   return(commits)
}

list <- c(500, 501, 502)

pull_lists <- lapply(list, pull)

The intention i that if the x-ratelimit-remaining variable goes below a certain threshold the script should wait until the time specified in x-ratelimit-reset has passed, and then continue the script. However, I'm not sure if this is the actual behavior of the if else set up that I have here.

The function runs fine, but I have some doubts about whether it actually does the rate limiting or whether it somehow skips that steps. Hence I ask: a) how can I find out if it actually does rate-limiting, and b) if not, how can I rewrite it so that it actually does rate limiting? Would a while condition/loop perhaps be better?

Upvotes: 4

Views: 612

Answers (2)

cmbarbu
cmbarbu

Reputation: 4532

You can test if it does the rate limiting changing 5 to a large enough number and adding a display of the timing of Sys.sleep using:

print(system.time(Sys.sleep(...)))

That said, the function seems ok to me, unfortunately I cannot test it easily as rgithub is not available for my version of R (3.1.3).

Upvotes: 1

jangorecki
jangorecki

Reputation: 16697

Not a canonical answer, but some working example.
You should add some logging in your script, even kind of write.csv(append=TRUE).

I've implemented automatic antiddos process which prevent your ip to be banned by the exchange market. You can find it jangorecki/Rbitcoin/R/utils.R.
Rbitcoin.last_api_call is env object stored in package namespace, kind of session package cache.
This can help you with setting it in your package.

You should also consider a optional parallel supported version. Linking to database with concurrency read. My function can be easy modified to queue call and recheck timing every X seconds.

Edit
I forget to add that mentioned function support multiple source systems. That allows for example to extend your rgithub for bitbucket, etc. and still effectively manage API rate limiting.

Upvotes: 0

Related Questions