Reputation: 93
I'm using the scrapy framework for a web scraping project but I can't seem to figure out how to get a custom output processor to work.
I have an item class like so:
class Item(scrapy.Item)
ad_type = scrapy.Field()
then my parse function looks something like this. I have 2 scraped strings which I am adding to the ad_type. I want my output processor function to assign tags based on what is scraped from these 2 xpaths.
def parse(self, response):
l = ItemLoader(item=Item(), selector=listing)
l.add_xpath('ad_type', '(.//div/@class)[1]')
l.add_xpath('ad_type', '(.//div[contains(@class, "brand")]/@class)[1]')
yield l.load_item()
How do I get my output processor function to access the 2 xpath scraped strings that I have added to ad_type? The scrapy docs give this example but I can't get it to work.
def lowercase_processor(self, values):
for v in values:
yield v.lower()
class MyItemLoader(ItemLoader):
name_in = lowercase_processor
Upvotes: 1
Views: 650
Reputation: 28256
You have named your loader MyItemLoader
, but your spider uses ItemLoader
(probably scrapy's).
If you update your code to use the custom loader, you should get the result you want.
I would also recommend not naming your item class Item
, since that could be confusing.
Upvotes: 2