Reputation: 31548
I am scraping the job sites where the first page ahs the links to all the jobs. Now i am storing the title , job , company from the first page.
But i also want to store the description , which is available by clicking on the job title. I want to store that as well with the current items.
This is my curent code
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select("//div[@class='jobenteries']")
items = []
for site in sites[:3]:
print "Hello"
item = DmozItem()
item['title'] = site.select('a/text()').extract()
item['desc'] = ''
items.append(item)
return items
But that description is on the next page link. how can i do that
Upvotes: 0
Views: 293
Reputation: 2254
From the first page, return Requests for the second page and pass the data for each item in the request.meta dict. On the callback method for the second page you can read the data you passed and return the fully populated item.
See Passing additional data to callback functions in the scrapy docs for more details and an example.
Upvotes: 3