Mr.SpyCat
Mr.SpyCat

Reputation: 7

How request multiple links at once and parse them later with scrapy?

I use Scrapy to get data from an API call but the server is laggy. First I scrape one page to get some IDs, and I add them to a list. After that, I check how many IDs I have, and I start scraping.

The max IDs I can add is 10: event_id=1,2,3,4,5,6,7,8,9,10. The problem is, because there are many IDs like 150, I have to make many requests, and the server responds after 3-5 seconds. I want to request all links at once and parse them later if this is possible.

match = "https://api.---.com/v1/?token=???&event_id&event_id="

class ApiSpider(scrapy.Spider):
    name = 'api'
    allowed_domains = ['api.---.com']
    start_urls = ['https://api.---.com/ids/&token=???']
    def parse(self, response):
        data = json.loads(response.body)
        results = (data['results'])
        for result in results:
            id_list.append(result['id'])
        yield from self.scrape_start()

        def scrape_start(self):
        if len(matches_id) >= 10:
            qq = (
                    match + id_list[0] + "," + id_list[1] + "," + id_list[2] + "," + id_list[3] + "," +
                    id_list[4] + "," + id_list[
                        5] + "," + id_list[6] + "," + id_list[7] + "," + id_list[8] + "," + id_list[9])
            yield scrapy.Request(qq, callback=self.parse_product)
            del matches_id[0:10]
        elif len(matches_id) == 9:
        ...

        def parse_product(self, response):
            data = (json.loads(response.body))
            results = (data['results'])
            for result in results:
            ...

Upvotes: 0

Views: 77

Answers (1)

ruhaib
ruhaib

Reputation: 649

try changing CONCURRENT_REQUESTS which is by default 16 to a higher number.

as per scrapy docs:

The maximum number of concurrent (ie. simultaneous) requests that will be performed to any single domain.

Note that in some cases this results in hardware bottlenecks, so try not to increase them by a lot. I'd recommend gradually increasing this value and observing system stats (CPU/Network).

Upvotes: 1

Related Questions