Reputation: 91
I am new to scrapy and python.
In my case:
http://www.example.com/search?keyword=city&style=1&page=1
http://www.example.com/search?keyword=city&style=1&page=2
http://www.example.com/search?keyword=city&style=1&page=3
Rules is:
`for i in range(50):
"http://www.example.com/search?keyword=city&style=1&page=%s" % i`
http://www.example.com/city_detail_0001.html
http://www.example.com/city_detail_0100.html
http://www.example.com/city_detail_0053.html
No rules, Because Page B is match the keyword for search.
So, This means,If I want grab some information from Page B,
First, I must use the Page A to sifting link of the Page B.
In the past, I usually two step:
1. I create scrapy A, and grab the Page B's link in a txt file
2. And in scrapy B, I read the txt file to the "start_urls"
Now, can u please guide me, that how can i construct the "start_urls" in one spider?
Upvotes: 1
Views: 678
Reputation: 18799
the start_requests
method is what you need. After that, keep passing the requests and parse the response bodies on callback methods.
class MySpider(Spider):
name = 'example'
def start_requests(self):
for i in range(50):
yield Request('myurl%s' % i, callback=self.parse)
def parse(self, response):
# get my information for page B
yield Request('pageB', callback=self.parse_my_item)
def parse_my_item(self, response):
item = {}
# real parsing method for my items
yield item
Upvotes: 2