Reputation: 425
def parse:
for link in LinkExtractor(restrict_xpaths="BLAH",).extract_links(response)[:-1]:
yield Request(link.url)
l = MytemsLoader()
l.add_value('main1', some xpath)
l.add_value('main2', some xpath)
l.add_value('main3', some xpath)
rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
for row in rows:
l.add_value('table1', some xpath based on rows)
l.add_value('table2', some xpath based on rows)
l.add_value('main3', some xpath based on rows)
yield l.loaditem()
I am using an itemloader because I want to preprocess these fields and deal with any null values easily. Each row of the table is supposed to be an entity which has the main1, 2, 3...etc fields plus its own fields. However, the above code overwrites the l itemloader just returning the last row for each main page.
Question: how can I combine the main page data with each table row entry using an itemloader? If I used 2 item loaders one for each section, how could they be combined?
For future reference:
def newparse:
for link in LinkExtractor(restrict_xpaths="BLAH",).extract_links(response)[:-1]:
yield Request(link.url)
ml = MyitemLoader()
ml.add_value('main1', some xpath)
ml.add_value('main2', some xpath)
ml.add_value('main3', some xpath)
main_item = ml.load_item()
rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
for row in rows:
bl = MyitemLoader(item=main_item, selector=row)
bl.add_value('table1', some xpath based on row)
bl.add_value('table2', some xpath based on row)
bl.add_value('main3', some xpath based on row)
yield bl.loaditem()
Upvotes: 2
Views: 2008
Reputation: 473943
You need to instantiate a new ItemLoader
in the loop providing an item
argument:
l = MytemsLoader()
l.add_value('main1', some xpath)
l.add_value('main2', some xpath)
l.add_value('main3', some xpath)
item = l.loaditem()
rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
for row in rows:
l = MytemsLoader(item=item)
l.add_value('table1', some xpath based on rows)
l.add_value('table2', some xpath based on rows)
l.add_value('main3', some xpath based on rows)
yield l.loaditem()
Upvotes: 4