Reputation: 187
I'm scraping few items from this site, but it grabs items only from the first product and doesn't loop further. I know I'm doing simple stupid mistake, but if you can just point out where I got this wrong, I'll appreciate it.
Here is the spider:
from scrapy.item import Item, Field
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
import re
from zoomer.items import ZoomerItem
class ZoomSpider(BaseSpider):
name = "zoomSp"
allowed_domains = ["zoomer.ge"]
start_urls = [
"http://zoomer.ge/index.php?cid=35&act=search&category=1&search_type=mobile"
]
def parse(self, response):
sel = Selector(response)
titles = sel.xpath('//div[@class="productContainer"]/div[5]')
items = []
for t in titles:
item = ZoomerItem()
item["brand"] = t.xpath('//div[@class="productListContainer"]/div[3]/text()').re('^([\w, ]+)')
item["price"] = t.xpath('//div[@class="productListContainer"]/div[4]/text()').extract()[0].strip()
item["model"] = t.xpath('//div[@class="productListContainer"]/div[3]/text()').re('\s+(.*)$')[0].strip()
items.append(item)
return(items)
P.S. Also can't get regex for "brand" string to get only the first word "Blackberry" from the string:
"BlackBerry P9981 Porsche Design".
Upvotes: 3
Views: 54
Reputation: 6218
The <div/>
element with the class productContainer
is just a container and only appears one time, thus it is not repeating. The repeating element which you want to iterate over is the one with the class productListContainer
.
def parse(self, response):
sel = Selector(response)
titles = sel.xpath('//div[@class="productContainer"]/div[5]/div[@class="productListContainer"]')
items = []
for t in titles:
item = ZoomerItem()
item["brand"] = t.xpath('div[3]/text()').re('^([\w\-]+)')
item["price"] = t.xpath('div[@class="productListPrice"]/div/text()').extract()
item["model"] = t.xpath('div[3]/text()').re('\s+(.*)$')[0].strip()
items.append(item)
items.append(item)
return(items)
This function is untested as I am not a python guy, so you might have to fiddle around a bit.
Upvotes: 3