Reputation: 173
I've build my 1st Scrapy project but can't figure out the last hurdle. With my script below I get one long list in csv. First all the Product Prices and than all the Product Names.
What I would like to achieve is that for every Product the price is next to in. For example:
Product Name, Product Price
Product Name, Product Price
My scrapy project:
Items.py
from scrapy.item import Item, Field
class PrijsvergelijkingItem(Item):
Product_ref = Field()
Product_price = Field()
My Spider called nvdb.py:
from scrapy.spider import BaseSpider
import scrapy.selector
from Prijsvergelijking.items import PrijsvergelijkingItem
class MySpider(BaseSpider):
name = "nvdb"
allowed_domains = ["vandenborre.be"]
start_urls = ["http://www.vandenborre.be/tv-lcd-led/lcd-led-tv-80-cm-alle-producten"]
def parse(self, response):
hxs = scrapy.Selector(response)
titles = hxs.xpath("//ul[@id='prodlist_ul']")
items = []
for titles in titles:
item = PrijsvergelijkingItem()
item["Product_ref"] = titles.xpath("//div[@class='prod_naam']//text()[2]").extract()
item["Product_price"] = titles.xpath("//div[@class='prijs']//text()[2]").extract()
items.append(item)
return items
Upvotes: 1
Views: 81
Reputation: 684
I am not sure if this can help you, but you can use OrderedDict from collections for your need.
from scrapy.spider import BaseSpider
import scrapy.selector
from collections import OrderedDict
from Prijsvergelijking.items import PrijsvergelijkingItem
class MySpider(BaseSpider):
name = "nvdb"
allowed_domains = ["vandenborre.be"]
start_urls = ["http://www.vandenborre.be/tv-lcd-led/lcd-led-tv-80-cm-alle-producten"]
def parse(self, response):
hxs = scrapy.Selector(response)
titles = hxs.xpath("//ul[@id='prodlist_ul']")
items = []
for titles in titles:
item = OrderedDict(PrijsvergelijkingItem())
item["Product_ref"] = titles.xpath("//div[@class='prod_naam']//text()[2]").extract()
item["Product_price"] = titles.xpath("//div[@class='prijs']//text()[2]").extract()
items.append(item)
return items
Also you might have to change the way you iterate dict,
for od in items:
for key,value in od.items():
print key,value
Upvotes: 0
Reputation: 473873
You need to switch your XPath expressions to work in the context of every "product". In order to do this, you need to prepend a dot to the expressions:
def parse(self, response):
products = response.xpath("//ul[@id='prodlist_ul']/li")
for product in products:
item = PrijsvergelijkingItem()
item["Product_ref"] = product.xpath(".//div[@class='prod_naam']//text()[2]").extract_first()
item["Product_price"] = product.xpath(".//div[@class='prijs']//text()[2]").extract_first()
yield item
I've also improved the code a little bit:
ul
->li
and not just ul
- fixed the expressionresponse.xpath()
shortcut methodextract_first()
instead of extract()
yield
instead of collecting items in a list and then returningUpvotes: 1