Reputation: 579
I was trying to scrape this link.
I want to send Categories names between two parse methods but when the scrapy crawler
follows the next page, it gives a KeyError
for the category_name
.
categories_names = response.request.meta['categories_names']
KeyError: 'categories_names'
How do I get the same category's name while following the next page?
# -*- coding: utf-8 -*-
import scrapy
class MainSpider(scrapy.Spider):
name = 'main'
start_urls = ['https://www.thomasnet.com/suppliers']
def parse(self, response):
li = response.xpath('//div[@class="titled-list titled-list--covid-19-response-section titled-list--dropdown "]/ul/li/a')
# li = response.xpath('//div[contains(@class, "titled-list--dropdown")]/ul/li/a')
for each in li:
categories_links = each.xpath('.//@href').get()
categories = each.xpath('.//text()').get()
yield response.follow(url=categories_links, callback=self.parse_li, meta={"categories_names": categories})
def parse_li(self, response):
categories_names = response.request.meta['categories_names']
rows = response.xpath('//header[@class="profile-card__header"]/parent::div')
for row in rows:
links = row.xpath('.//header[@class="profile-card__header"]/h2/a/@href').get()
company_type = row.xpath('.//span[@data-content="Company Type"]/text()[2]').get()
yield {
"Links": links,
"Categories": categories_names,
"Company Type": company_type if company_type else "N/A"
}
next_page = response.xpath('(//*[@class="icon"]/parent::a[@class="page-link"])[2]/@href').get()
if next_page:
yield response.follow(url=next_page, callback=self.parse_li)
Upvotes: 0
Views: 145
Reputation: 1
You should access the meta attribute from the response object.
categories_names = response.meta['categories_names']
But, the recommended way of doing this right now would be to use cb_kwags.
Upvotes: 0
Reputation: 2564
I've edited my answer since I misunderstood the issue before.
I believe the problem is that the parse_li
will yield new requests recursively, but without assigning the meta params again:
next_page = response.xpath('(//*[@class="icon"]/parent::a[@class="page-link"])[2]/@href').get()
if next_page:
yield response.follow(url=next_page, callback=self.parse_li)
As far as I can tell arbitrary data in meta
is not propagated to following requests, so you will need to reassing it:
yield response.follow(
url=next_page,
callback=self.parse_li,
meta={"categories_names": categories_names}
)
Consider taking a look at cb_kwargs
in the future, they are the recommended param to pass arbitraty data between requests since Scrapy v1.7, you can check it out here. (They work slightly different from meta though)
Upvotes: 2