Reputation: 405
I couldn't find any answer to my problem so I hope it will be ok to ask here.
I am trying to scrap cinema shows and still getting following error.
What is really confusing for me that the problem apparently lies in pipelines. However, I have second spider for opera house with the exact same code(only place is different) and it works just fine."Shows" and "Place" refers to my Django models. I've changed their fields to be CharFields so it's not a problem with wrong date/time format.
I also tried to use dedicated scrapy item "KikaItem" instead of "ShowItem" (which is shared with my opera spider) but the error still remains.
class ScrapyKika(object):
def process_item(self, ShowItem, spider):
place, created = Place.objects.get_or_create(name="kino kika")
show = Shows.objects.update_or_create(
time=ShowItem["time"],
date=ShowItem["date"],
place=place,
defaults={'title': ShowItem["title"]}
)
return ShowItem
Here is my spider code.I expect the problem is somewhere here, because I used a different approach here than in the opera one. However,I am not sure what can be wrong.
import scrapy
from ..items import ShowItem, KikaItemLoader
class KikaSpider(scrapy.Spider):
name = "kika"
allowed_domains = ["http://www.kinokika.pl/dk.php"]
start_urls = [
"http://www.kinokika.pl/dk.php"
]
def parse(self, response):
divs = response.xpath('//b')
for div in divs:
l = KikaItemLoader(item=ShowItem(), response=response)
l.add_xpath("title", "./text()")
l.add_xpath("date", "./ancestor::ul[1]/preceding-sibling::h2[1]/text()")
l.add_xpath("time", "./preceding-sibling::small[1]/text()")
return l.load_item()
ItemLoader
class KikaItemLoader(ItemLoader):
title_in = MapCompose(strip_string,lowercase)
title_out = Join()
time_in = MapCompose(strip_string)
time_out = Join()
date_in = MapCompose(strip_string)
date_out = Join()
Thank you for your time and sorry for any misspellings :)
Upvotes: 1
Views: 918
Reputation: 473903
Currently, your spider yields a single item:
{'title': u' '}
which does not have the date
and time
fields filled out. This is because of the way you initialize the ItemLoader
class in your spider.
You should be initializing your item loader with a specific selector in mind. Replace:
for div in divs:
l = KikaItemLoader(item=ShowItem(), response=response)
with:
for div in divs:
l = KikaItemLoader(item=ShowItem(), selector=div)
Upvotes: 2