Reputation: 4514
One of data sources I extract data from provides access through REST API in form of JSON responses. That's great, because I get data already structured, i.e., less pain with scraping and parsing unstructured HTML documents.
However, they constrain HTTP traffic with rate limiting: requests per minutes/hour/month/IP/user email.
When I was scraping HTML documents with Scrapy I could easily configure number of requests per second, delays between subsequent requests, number of threads, etc. I will call it "load strategy". The way it works in Scrapy under the hood, is that I generate a number of HTTP requests that Scrapy puts into the queue, and process requests from the queue with respect to the given "load strategy".
Is there something like that for REST APIs?
To give some context, I'm using Python REST client generated from data source Swagger definitions. The client uses urlib3 under the hood. The client provides a way to execute requests in an asynchronous way and a way to configure a thread pool but it looks like I would need to play a bit around to configure it. I'm looking for out-of-the-box solution.
Upvotes: 0
Views: 210
Reputation: 348
With a generated client you will be able to make requests to the corresponding REST API. However, you'll need to build your own code/logic for inserting delays between requests and request queuing. Much of the convenience that Scrapy is providing for you will need to be implemented by you. Or you'll need to find tools/package that will provide this functionality for you.
Upvotes: 1