Scrapy: Item Loader and KeyError even when Key is defined

Question

Intention / expected behaviour

Return the text of the links from page: https://www.bezrealitky.cz/vypis/nabidka-prodej/byt/praha

In CSV format and in the shell.

Error

I get a KeyError: 'title', even though I have defined the key in the item.py itemloader.

Full Traceback

Traceback (most recent call last):
  File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
    yield next(it)
  File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 22, in 
    return (_set_referer(r) for r in result or ())
  File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in 
    return (r for r in result or () if _filter(r))
  File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in 
    return (r for r in result or () if _filter(r))
  File "C:\Users\phili\Documents\Python Scripts\Scrapy Spiders\bezrealitky\bezrealitky\spiders\bezrealitky_spider.py", line 33, in parse
    yield loader.load_item()
  File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 115, in load_item
    value = self.get_output_value(field_name)
  File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 122, in get_output_value
    proc = self.get_output_processor(field_name)
  File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 144, in get_output_processor
    self.default_output_processor)
  File "C:\Users\phili\Anaconda3\envs\py35\lib\site-packages\scrapy\loader\__init__.py", line 154, in _get_item_field_attr
    value = self.item.fields[field_name].get(key, default)
KeyError: 'title'

Spider.py

def parse(self, response):

for records in response.xpath('//*[starts-with(@class,"record")]'):
    loader = BaseItemLoader(selector=records)
    loader.add_xpath('title', './/div[@class="details"]/h2/a[@href]/text()')
    yield loader.load_item()

Item.py - Itemloader

class BaseItemLoader(ItemLoader):
    title_in = MapCompose(unidecode)

Conclusion

I am a bit at a loss, as I think I followed the Scrapy manual and defined the item loader and the key by "title_in", but then when I yield the value to it I get the KeyError. I check in the shell that the Xpath provides the text I want, so at least that is working. Hoping to get some help!

mizhgun · Accepted Answer

Even if you use ItemLoader you should define Item class first and then pass it to the item loader either defining it as loader's property:

class CustomItemLoader(ItemLoader):
    default_item_class = MyItem

or passing its instance to loader's constructor:

l = CustomItemLoader(item=Item())

otherwise item loader knows nothing about the item and its fields.

Scrapy: Item Loader and KeyError even when Key is defined

Answers (1)

Related Questions