Reputation: 493
I have two spiders, and I want one call the other with the information scraped, which are not links that I could follow. Is there any way of doing it calling a spider from another one?
To illustrate better the problem: the url of the "one" page is the form /one/{item_name}
, where {item_name}
is the information I can get from the page /other/
...
<li class="item">item1</li>
<li class="item">someItem</li>
<li class="item">anotherItem</li>
...
Then I have the spider OneSpider
that scrapes the /one/{item_name}
, and the OtherSpider
that scrapes /other/
and retrieve the item names, as shown below:
class OneSpider(Spider):
name = 'one'
def __init__(self, item_name, *args, **kargs):
super(OneSpider, self).__init__(*args, **kargs)
self.start_urls = [ f'/one/{item_name}' ]
def parse(self, response):
...
class OtherSpider(Spider):
name = 'other'
start_urls = [ '/other/' ]
def parse(self, response):
itemNames = response.css('li.item::text').getall()
# TODO:
# for each item name
# scrape /one/{item_name}
# with the OneSpider
I already checked these two questions: How to call particular Scrapy spiders from another Python script, and scrapy python call spider from spider, and several other questions where the main solution is creating another method inside the class and passing it as callbacks to new requests, but I don't think it is applicable when these new requests would have customized urls.
Upvotes: 1
Views: 810
Reputation: 3561
Scrapy don't have possibility to call spider from another spider. related issue in scrapy github repo
However You can merge logic from 2 your spiders into single spider class:
import scrapy
class OtherSpider(scrapy.Spider):
name = 'other'
start_urls = [ '/other/' ]
def parse(self, response):
itemNames = response.css('li.item::text').getall()
for item_name in itemNames:
yield scrapy.Request(
url = f'/one/{item_name}',
callback = self.parse_item
)
def parse_item(self, response):
# parse method from Your OneSpider class
Upvotes: 1