Wouter
Wouter

Reputation: 173

Scrapy merging to 1 list

I've build my 1st Scrapy project but can't figure out the last hurdle. With my script below I get one long list in csv. First all the Product Prices and than all the Product Names.

What I would like to achieve is that for every Product the price is next to in. For example:

Product Name, Product Price
Product Name, Product Price

My scrapy project:

Items.py

from scrapy.item import Item, Field


class PrijsvergelijkingItem(Item):
    Product_ref = Field()
    Product_price = Field()

My Spider called nvdb.py:

from scrapy.spider import BaseSpider
import scrapy.selector
from Prijsvergelijking.items import PrijsvergelijkingItem

class MySpider(BaseSpider):

name = "nvdb"
allowed_domains = ["vandenborre.be"]
start_urls = ["http://www.vandenborre.be/tv-lcd-led/lcd-led-tv-80-cm-alle-producten"]

def parse(self, response):
    hxs = scrapy.Selector(response)
    titles = hxs.xpath("//ul[@id='prodlist_ul']")
    items = []
    for titles in titles:
        item = PrijsvergelijkingItem()
        item["Product_ref"] = titles.xpath("//div[@class='prod_naam']//text()[2]").extract()
        item["Product_price"] = titles.xpath("//div[@class='prijs']//text()[2]").extract()
        items.append(item)
    return items  

Upvotes: 1

Views: 81

Answers (2)

majin
majin

Reputation: 684

I am not sure if this can help you, but you can use OrderedDict from collections for your need.

from scrapy.spider import BaseSpider
import scrapy.selector
from collections import OrderedDict
from Prijsvergelijking.items import PrijsvergelijkingItem

class MySpider(BaseSpider):

name = "nvdb"
allowed_domains = ["vandenborre.be"]
start_urls = ["http://www.vandenborre.be/tv-lcd-led/lcd-led-tv-80-cm-alle-producten"]

def parse(self, response):
    hxs = scrapy.Selector(response)
    titles = hxs.xpath("//ul[@id='prodlist_ul']")
    items = []
    for titles in titles:
        item = OrderedDict(PrijsvergelijkingItem())
        item["Product_ref"] = titles.xpath("//div[@class='prod_naam']//text()[2]").extract()
        item["Product_price"] = titles.xpath("//div[@class='prijs']//text()[2]").extract()
        items.append(item)
    return items

Also you might have to change the way you iterate dict,

for od in items:
    for key,value in od.items():
        print key,value

Upvotes: 0

alecxe
alecxe

Reputation: 473873

You need to switch your XPath expressions to work in the context of every "product". In order to do this, you need to prepend a dot to the expressions:

def parse(self, response):
    products = response.xpath("//ul[@id='prodlist_ul']/li")
    for product in products:
        item = PrijsvergelijkingItem()
        item["Product_ref"] = product.xpath(".//div[@class='prod_naam']//text()[2]").extract_first()
        item["Product_price"] = product.xpath(".//div[@class='prijs']//text()[2]").extract_first()
        yield item

I've also improved the code a little bit:

  • I assume you meant to iterate over list items ul->li and not just ul - fixed the expression
  • used the response.xpath() shortcut method
  • used extract_first() instead of extract()
  • improved the variable naming
  • used yield instead of collecting items in a list and then returning

Upvotes: 1

Related Questions