Jeroen Vermunt
Jeroen Vermunt

Reputation: 880

Scrapy use other variables of item in item processor

I am requesting adress information for a webservice to crosscheck whether the adres that I already have is in the same format as the webservice I am requesting from.

For this I have the following item with the following input_processor:


class AdresItem(scrapy.Item):

    postal_code = scrapy.Field()
    house_number = scrapy.Field()
    addition = scrapy.Field()
    scraped_addition = scrapy.Field(
                                 input_processor = MapCompose(MyFunction),
                                 output_processor = TakeFirst()
                              )


def MyFunction(scraped_addition):
    if scraped_addition == addition
        return scraped_addition
    else:
        return None

ofcourse I can't access the original addition this way. What would be a good way to go about using another variable of the item in the input processor?

Upvotes: 1

Views: 316

Answers (1)

SuperUser
SuperUser

Reputation: 4822

Set the variable through item context and load the variable in the function.

Example:

import scrapy
from scrapy.loader import ItemLoader
from scrapy.loader.processors import MapCompose


def MyFunction(scraped_addition, loader_context):
    addition = loader_context.get('addition')
    if scraped_addition == addition:
        return scraped_addition
    else:
        return None


class ExampleItem(scrapy.Item):
    scraped_addition = scrapy.Field(input_processor=MapCompose(MyFunction))


class ExampleSpider(scrapy.Spider):
    name = 'exampleSpider'
    start_urls = ['https://scrapingclub.com/exercise/detail_basic/']

    def parse(self, response):
        l = ItemLoader(item=ExampleItem(), response=response)
        l.context['addition'] = 'Long-sleeved Jersey Top'
        l.add_xpath('scraped_addition', '//h3/text()')
        yield l.load_item()

Upvotes: 2

Related Questions