Tianhe Xie
Tianhe Xie

Reputation: 261

Scraping data between two spans scrapy

I'm scraping a web and want to get the price information of all products on the first page. Below is the html of the web. I want to get 99.

<div class = 'item-bg'>
    <div class = 'product-box'>
        <div class = 'res-info'>
            <div class = 'price-box'>
                <span class = 'def-price selectorgadget_rejected'>
                    <i>$</i>
                    99
                    <i>.99</i>
                </span>
            </div>
        </div>
    </div>
</div>

I don't think I can use the def-price class because some products have 'selectorgadget_rejected' and some products have 'selectorgadget_suggested' after it. My code right now is

product_info = response.css('.item-bg')
for product in product_info:
    product_price_sn = product.css('.price-box').extract() 

It's not getting 99 and I'm not sure how to fix it. Any ideas?

Here is the screenshot of the full HTML info: suning html

Upvotes: 2

Views: 850

Answers (1)

renatodvc
renatodvc

Reputation: 2564

I always prefer to use XPath over CSS. In XPath you could use the contains function to specify which classes you want to select, like:

response.xpath('//span[contains(@class, "def-price selectorgadget")]//text()').extract() 
  • This would extract text from ALL the <span> tags in the page which it's class contained the expression def-price selectorgadget wheter it be selectorgadget_rejected or selectorgadget_suggested.

Or using the pre-selected product_info:

product_info = response.css('.item-bg')
for product in product_info:
    product_price_sn = product.xpath('div/div/div/span[contains(@class, "def-price selectorgadget")]//text()').extract() 

Using full path because only snippet of HTML was posted

If you want only the 99 outside the <i> tags use /text() instead of //text()


CSS Selector

Now, in case you want to stick with the CSS selectors, this might work:

product.css('.price-box span::text').extract()

Upvotes: 1

Related Questions