codervince
codervince

Reputation: 425

scrapy itemloaders return list of items

def parse:
    for link in   LinkExtractor(restrict_xpaths="BLAH",).extract_links(response)[:-1]:
            yield Request(link.url)
    l = MytemsLoader()
    l.add_value('main1', some xpath)
    l.add_value('main2', some xpath)
    l.add_value('main3', some xpath)

     rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
     for row in rows:
         l.add_value('table1', some xpath based on rows)
         l.add_value('table2', some xpath based on rows)
         l.add_value('main3', some xpath based on rows)
         yield l.loaditem()

I am using an itemloader because I want to preprocess these fields and deal with any null values easily. Each row of the table is supposed to be an entity which has the main1, 2, 3...etc fields plus its own fields. However, the above code overwrites the l itemloader just returning the last row for each main page.

Question: how can I combine the main page data with each table row entry using an itemloader? If I used 2 item loaders one for each section, how could they be combined?

For future reference:

def newparse:
    for link in   LinkExtractor(restrict_xpaths="BLAH",).extract_links(response)[:-1]:
            yield Request(link.url)
    ml = MyitemLoader()
    ml.add_value('main1', some xpath)
    ml.add_value('main2', some xpath)
    ml.add_value('main3', some xpath)
    main_item = ml.load_item()
     rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
     for row in rows:
         bl = MyitemLoader(item=main_item, selector=row)
         bl.add_value('table1', some xpath based on row)
         bl.add_value('table2', some xpath based on row)
         bl.add_value('main3', some xpath based on row)
         yield bl.loaditem()             

Upvotes: 2

Views: 2008

Answers (1)

alecxe
alecxe

Reputation: 473943

You need to instantiate a new ItemLoader in the loop providing an item argument:

l = MytemsLoader()
l.add_value('main1', some xpath)
l.add_value('main2', some xpath)
l.add_value('main3', some xpath)
item = l.loaditem()

rows = response.xpath("table[@id='BLAH']/tbody[contains(@id, 'BLOB')]")
for row in rows:
    l = MytemsLoader(item=item)

    l.add_value('table1', some xpath based on rows)
    l.add_value('table2', some xpath based on rows)
    l.add_value('main3', some xpath based on rows)

    yield l.loaditem()

Upvotes: 4

Related Questions