Scrapy merging to 1 list

Question

I've build my 1st Scrapy project but can't figure out the last hurdle. With my script below I get one long list in csv. First all the Product Prices and than all the Product Names.

What I would like to achieve is that for every Product the price is next to in. For example:

Product Name, Product Price
Product Name, Product Price

My scrapy project:

Items.py

from scrapy.item import Item, Field


class PrijsvergelijkingItem(Item):
    Product_ref = Field()
    Product_price = Field()

My Spider called nvdb.py:

from scrapy.spider import BaseSpider
import scrapy.selector
from Prijsvergelijking.items import PrijsvergelijkingItem

class MySpider(BaseSpider):

name = "nvdb"
allowed_domains = ["vandenborre.be"]
start_urls = ["http://www.vandenborre.be/tv-lcd-led/lcd-led-tv-80-cm-alle-producten"]

def parse(self, response):
    hxs = scrapy.Selector(response)
    titles = hxs.xpath("//ul[@id='prodlist_ul']")
    items = []
    for titles in titles:
        item = PrijsvergelijkingItem()
        item["Product_ref"] = titles.xpath("//div[@class='prod_naam']//text()[2]").extract()
        item["Product_price"] = titles.xpath("//div[@class='prijs']//text()[2]").extract()
        items.append(item)
    return items

majin · Accepted Answer

I am not sure if this can help you, but you can use OrderedDict from collections for your need.

from scrapy.spider import BaseSpider
import scrapy.selector
from collections import OrderedDict
from Prijsvergelijking.items import PrijsvergelijkingItem

class MySpider(BaseSpider):

name = "nvdb"
allowed_domains = ["vandenborre.be"]
start_urls = ["http://www.vandenborre.be/tv-lcd-led/lcd-led-tv-80-cm-alle-producten"]

def parse(self, response):
    hxs = scrapy.Selector(response)
    titles = hxs.xpath("//ul[@id='prodlist_ul']")
    items = []
    for titles in titles:
        item = OrderedDict(PrijsvergelijkingItem())
        item["Product_ref"] = titles.xpath("//div[@class='prod_naam']//text()[2]").extract()
        item["Product_price"] = titles.xpath("//div[@class='prijs']//text()[2]").extract()
        items.append(item)
    return items

Also you might have to change the way you iterate dict,

for od in items:
    for key,value in od.items():
        print key,value

Scrapy merging to 1 list

Answers (2)

Related Questions