Grevioos
Grevioos

Reputation: 405

Scrapy KeyError while processing

I couldn't find any answer to my problem so I hope it will be ok to ask here.

I am trying to scrap cinema shows and still getting following error.

enter image description here

What is really confusing for me that the problem apparently lies in pipelines. However, I have second spider for opera house with the exact same code(only place is different) and it works just fine."Shows" and "Place" refers to my Django models. I've changed their fields to be CharFields so it's not a problem with wrong date/time format.

I also tried to use dedicated scrapy item "KikaItem" instead of "ShowItem" (which is shared with my opera spider) but the error still remains.

class ScrapyKika(object):
    def process_item(self, ShowItem, spider):
        place, created = Place.objects.get_or_create(name="kino kika")

        show = Shows.objects.update_or_create(
            time=ShowItem["time"],
            date=ShowItem["date"],
            place=place,
            defaults={'title': ShowItem["title"]}
        )

        return ShowItem

Here is my spider code.I expect the problem is somewhere here, because I used a different approach here than in the opera one. However,I am not sure what can be wrong.

import scrapy
from ..items import ShowItem, KikaItemLoader

class KikaSpider(scrapy.Spider):
    name = "kika"
    allowed_domains = ["http://www.kinokika.pl/dk.php"]
    start_urls = [
        "http://www.kinokika.pl/dk.php"


    ]
    def parse(self, response):
        divs = response.xpath('//b')
        for div in divs:
            l = KikaItemLoader(item=ShowItem(), response=response)
            l.add_xpath("title", "./text()")
            l.add_xpath("date", "./ancestor::ul[1]/preceding-sibling::h2[1]/text()")
            l.add_xpath("time", "./preceding-sibling::small[1]/text()")
            return l.load_item()

ItemLoader

class KikaItemLoader(ItemLoader):
    title_in = MapCompose(strip_string,lowercase)
    title_out = Join()

    time_in = MapCompose(strip_string)
    time_out = Join()

    date_in = MapCompose(strip_string)
    date_out = Join()

Thank you for your time and sorry for any misspellings :)

Upvotes: 1

Views: 918

Answers (1)

alecxe
alecxe

Reputation: 473903

Currently, your spider yields a single item:

{'title': u'  '}

which does not have the date and time fields filled out. This is because of the way you initialize the ItemLoader class in your spider.

You should be initializing your item loader with a specific selector in mind. Replace:

for div in divs:
    l = KikaItemLoader(item=ShowItem(), response=response)

with:

for div in divs:
    l = KikaItemLoader(item=ShowItem(), selector=div)

Upvotes: 2

Related Questions