Svarto
Svarto

Reputation: 643

Item Loader not working with response.meta

I want to load two items into an item loader, that is instantiated through the response.meta command. Somehow, the standard:

loader.add_xpath('item', 'xpath')

Is not working (i.e. no value is saved or written, it is like the 'item' was never created), but with the exact same expression the:

response.xpath('xpath)
loader.add_value('item',value) 

works? Anyone now why? Complete code below:

Spider.py

def parse(self, response):
    for record in response.xpath('//div[@class="box list"]/div[starts-with(@class,"record")]'):
        loader = BaseItemLoader(item=BezrealitkyItems(), selector=record)
        loader.add_xpath('title','.//div[@class="details"]/h2/a[@href]/text()')
        listing_url = record.xpath('.//div[@class="details"]/p[@class="short-url"]/text()').extract_first()
        yield scrapy.Request(listing_url, meta={'loader' : loader}, callback=self.parse_listing)

def parse_listing(self, response):
    loader = response.meta['loader']
    loader.add_value('url', response.url)
    loader.add_xpath('lat','//script[contains(.,"recordGps")]',re=r'(?:"lat":)[0-9]+\.[0-9]+')
    return loader.load_item()

The above does not work, when I try this it works though:

    lat_coords = response.xpath('//script[contains(.,"recordGps")]/text()').re(r'(?:"lat":)([0-9]+\.[0-9]+)')
    loader.add_value('lat', lat_coords)

My item.py has nothing special:

class BezrealitkyItems(scrapy.Item):
    title = scrapy.Field()
    url = scrapy.Field()
    lat = scrapy.Field()
class BaseItemLoader(ItemLoader):
    title_in = MapCompose(lambda v: v.strip(), Join(''), unidecode)
    title_out = TakeFirst()

Just to clarify, I get no error message. It is just that the 'lat' item has not been created nor nothing scraped to it. The other items are scraped fine, including the url that is also added through the parse_listing function.

Upvotes: 1

Views: 993

Answers (1)

Granitosaurus
Granitosaurus

Reputation: 21436

It happens because you are carrying over loader reference which has it's own selector object.
Here you create and assign a selector parameter with your reference:

loader = BaseItemLoader(item=BezrealitkyItems(), selector=record)

Now later you put this loader into your Request.meta attribute and carry it over to the next parse method. What you aren't doing though is updating the selector context once you retrieve the loader from the meta:

loader = response.meta['loader']
# if you check loader.selector you'll see that it still has html body
# set in previous method, i.e. selector of record in your case
loader.selector = Selector(response)  # <--- this is missing

This would work, however it should be avoided because having complex objects with a lot of references in meta is a bad idea and can cause all kind of errors that are mostly related to Twisted framework (that scrapy uses for it's concurrency).
What you should do however is load and recreate item in every step:

def parse(self, response):
    loader = BaseItemLoader(item=BezrealitkyItems(), selector=record)
    yield scrapy.Request('some_url', meta={'item': loader.load_item()}, callback=self.parse2)

def parse2(self, response):
    loader = BaseItemLoader(item=response.meta['item'], selector=record)

Upvotes: 2

Related Questions