user3404005
user3404005

Reputation: 187

Can't get additional items from url

I'm scraping few items from this site, but it grabs items only from the first product and doesn't loop further. I know I'm doing simple stupid mistake, but if you can just point out where I got this wrong, I'll appreciate it.

Here is the spider:

from scrapy.item import Item, Field
from scrapy.spider import BaseSpider
from scrapy.selector import Selector
import re 

from zoomer.items import ZoomerItem


class ZoomSpider(BaseSpider):
    name = "zoomSp"
    allowed_domains = ["zoomer.ge"]
    start_urls = [
        "http://zoomer.ge/index.php?cid=35&act=search&category=1&search_type=mobile"
    ]

    def parse(self, response):
        sel = Selector(response)
        titles = sel.xpath('//div[@class="productContainer"]/div[5]')
        items = []
        for t in titles:
            item = ZoomerItem()
            item["brand"] = t.xpath('//div[@class="productListContainer"]/div[3]/text()').re('^([\w, ]+)')
            item["price"] = t.xpath('//div[@class="productListContainer"]/div[4]/text()').extract()[0].strip()
            item["model"] = t.xpath('//div[@class="productListContainer"]/div[3]/text()').re('\s+(.*)$')[0].strip()

            items.append(item)

        return(items)   

P.S. Also can't get regex for "brand" string to get only the first word "Blackberry" from the string:

"BlackBerry P9981 Porsche Design".

Upvotes: 3

Views: 54

Answers (1)

dirkk
dirkk

Reputation: 6218

The <div/> element with the class productContainer is just a container and only appears one time, thus it is not repeating. The repeating element which you want to iterate over is the one with the class productListContainer.

def parse(self, response):
    sel = Selector(response)
    titles = sel.xpath('//div[@class="productContainer"]/div[5]/div[@class="productListContainer"]')
    items = []
    for t in titles:
        item = ZoomerItem()
        item["brand"] = t.xpath('div[3]/text()').re('^([\w\-]+)')
        item["price"] = t.xpath('div[@class="productListPrice"]/div/text()').extract()
        item["model"] = t.xpath('div[3]/text()').re('\s+(.*)$')[0].strip()
        items.append(item)

        items.append(item)

    return(items) 

This function is untested as I am not a python guy, so you might have to fiddle around a bit.

Upvotes: 3

Related Questions