Hamid Mousavi
Hamid Mousavi

Reputation: 223

Scrapy send multiple requests

I'm working on a code that must read and process date and time information from a remote Json file at any time. The code I wrote is as follows:

import scrapy

class TimeSpider(scrapy.Spider):
    name = 'getTime'
    allowed_domains = ['worldtimeapi.org']
    start_urls = ['http://worldtimeapi.org']

    def parse(self,response):
        time_json='http://worldtimeapi.org/api/timezone/Asia/Tehran'
        for i in range(5):
            print(i)
            yield scrapy.Request(url=time_json, callback=self.parse_json)


    def parse_json(self,response):
        print(response.json())

And the output it gives is as follows:

0
1
2
3
4
{'abbreviation': '+0430', 'client_ip': '45.136.231.43', 'datetime': '2022-04-22T22:01:44.198723+04:30', 'day_of_week': 5, 'day_of_year': 112, 'dst': True, 'dst_from': '2022-03-21T20:30:00+00:00', 'dst_offset': 3600, 'dst_until': '2022-09-21T19:30:00+00:00', 'raw_offset': 12600, 'timezone': 'Asia/Tehran', 'unixtime': 1650648704, 'utc_datetime': '2022-04-22T17:31:44.198723+00:00', 'utc_offset': '+04:30', 'week_number': 16}

As you can see, the program only calls the parse_json function once, while it has to call the function in every loop

Can anyone help me solve this problem?

Upvotes: 1

Views: 538

Answers (1)

stranac
stranac

Reputation: 28236

Additional requests are being dropped by scrapy's default duplicates filter.
The simplest way to avoid this is to pass the dont_filter argument:

yield scrapy.Request(url=time_json, callback=self.parse_json, dont_filter=True)

From the docs:

dont_filter (bool) – indicates that this request should not be filtered by the scheduler. This is used when you want to perform an identical request multiple times, to ignore the duplicates filter. Use it with care, or you will get into crawling loops. Default to False.

Upvotes: 2

Related Questions