Reputation: 625
class PractiseSpider(scrapy.Spider):
name = "practise"
allowed_domains = ["practise.com"]
start_urls = ['https://practise.com/product/{}/']
def parse(self, response):
#do something
#scrape with next url in the list
My list m
contains the url needed to be added like product/{}/.format(m[i])
iteratively.
How do I do this. Should I make new spider calls for each Url or should I write some code for the spider to automatically iterate the list. If the answer is the latter what do i write ?
I know there are many answers related to this, for e.g. this but i have a fixed and known list of urls.
Upvotes: 1
Views: 1173
Reputation: 21406
Alternatively to overriding start_urls
, you can override start_requests()
method of your spider. This method yields requests that start off your spider.
By default your spider does this:
def start_requests(self):
for url in self.start_urls:
yield Request(url, dont_filter=True)
so you can modify this method in your spider to anything you want to:
def start_requests(self):
ids = pop_ids_from_db()
for id in ids:
url = f'http://example.com/product/{id}'
yield Request(url, dont_filter=True)
Upvotes: 2
Reputation: 10210
If you know the URLs beforehand, just populate start_urls
. If you say m
is a list of products (that's what I assume from what you wrote), then it would look like this:
start_urls = ['https://practise.com/product/{}/'.format(product) for product in m]
Upvotes: 3