Reputation: 13
My problem is: I've a list (html - li) on the main page and for each component on the list i want to enter in another page, take some information, put it together in one item element, and interact over other antoher element on the main page list (html - li). I've done this first code, but i'm newbie with Python, Scrapy and i've found some dificultes to made the code.
I got this solution, but it generates two items for each main list element.
class BoxSpider(scrapy.Spider):
name = "mag"
start_urls = [
"http://www.example.com/index.html"
]
def secondPage(self, response):
secondPageItem = CinemasItem()
secondPageItem['trailer'] = 'trailer'
secondPageItem['synopsis'] = 'synopsis'
yield secondPageItem
def parse(self, response):
for sel in response.xpath('//*[@id="conteudoInternas"]/ul/li'):
item = CinemasItem()
item['title'] = 'title'
item['room'] = 'room'
item['mclass'] = 'mclass'
item['minAge'] = 'minAge'
item['cover'] = 'cover'
item['sessions'] = 'sessions'
secondUrl = sel.xpath('p[1]/a/@href').extract()[0]
yield item
yield scrapy.Request(url=secondUrl, callback=self.secondPage)
Can some one help me to generate just one item element with 'title', 'room', 'mclass', 'minAge', 'cover', 'sessions', 'trailer', 'synopsis' fields filled? Instead of one item with 'title', 'room', 'mclass', 'minAge', 'cover', 'sessions' fields filled and other with 'trailer', 'synopsis' filled?
Upvotes: 1
Views: 86
Reputation: 474191
You need to pass the item
instantiated in parse()
inside the meta
to the secondPage
callback:
def parse(self, response):
for sel in response.xpath('//*[@id="conteudoInternas"]/ul/li'):
item = CinemasItem()
item['title'] = 'title'
item['room'] = 'room'
item['mclass'] = 'mclass'
item['minAge'] = 'minAge'
item['cover'] = 'cover'
item['sessions'] = 'sessions'
secondUrl = sel.xpath('p[1]/a/@href').extract()[0]
# see: we are passing the item inside the meta
yield scrapy.Request(url=secondUrl, meta={'item': item}, callback=self.secondPage)
def secondPage(self, response):
# see: we are getting the item from meta
item = response.meta['item']
item['trailer'] = 'trailer'
item['synopsis'] = 'synopsis'
yield item
Also see:
Upvotes: 1