chrisHG
chrisHG

Reputation: 80

Scrapy Stripping Comma

import scrapy
import pandas as pd
from ..items import HomedepotpricespiderItem
from scrapy.http import Request


class HomedepotspiderSpider(scrapy.Spider):
    name = 'homeDepotSpider'
    allowed_domains = ['homedepot.com']

    
    start_urls = ['https://www.homedepot.com/pep/304660691']#.format(omsID = omsID)
        #for omsID in omsList]

    def parse(self, response):

    #call home depot function
        for item in self.parseHomeDepot(response):
            yield item

        pass

    def parseHomeDepot(self, response):

        #get top level item
        items = response.css('#zone-a-product')
        for product in items:
            item = HomedepotpricespiderItem()

    #get SKU
            productSKU = product.css('.product-info-bar__detail:nth-child(2)::text').getall()

    #get rid of all the stuff i dont need
            #productSKU = [x.strip(' ') for x in productSKU] #whiteSpace
            #productSKU = [x.strip(',') for x in productSKU] 
            #productSKU = [x.strip('\n') for x in productSKU]
            #productSKU = [x.strip('\t') for x in productSKU]
            #productSKU = [x.strip(' Model# ') for x in productSKU] #gets rid of the model name

So my selectors are fine and they get the correct fields.

When running with the strip lines commented out I get 'Model #,RA30'

then when I run my program with the strip commands not commented out I get ,RA30

Im running my program using this command in terminal: scrapy crawl homeDepotSpider -t csv -o - > "/Users/userName/Desktop/homeDepotv2Helpers/homeDepotTest.csv"

and the output I have above is copied directly from the CSV

Edit*

I've also tried this

productSKU = [x.replace(' ,', '') for x in productSKU] 

and that didn't work. Also this is the direct output from terminal {'productSKU': ['', 'RA30']}

Upvotes: 0

Views: 139

Answers (3)

stranac
stranac

Reputation: 28266

Your selector gives you a list of two elements: ['Model #', 'RA30'].

To get only the SKU, simply use indexing:

productSKU = product.css('.product-info-bar__detail:nth-child(2)::text').getall()[1]

If there's a chance that a product won't have an SKU, make sure to handle exceptions correctly.

Upvotes: 2

gangabass
gangabass

Reputation: 10666

Why don't you want to use XPath + regex?

product_model = response.xpath('//h2[@class="product-info-bar__detail"][contains(., "Model #")]/text()').re_first(r'#(.+)')

Upvotes: 1

Patrick Klein
Patrick Klein

Reputation: 1201

The strip function will only remove signs or substrings at the beginning or end of a string. If you want to remove a character no matter where in the string, use the replace function. However, if you only want to remove the comma in the beginning or at the end of your string, you should repeat your line productSKU = [x.strip(',') for x in productSKU] once more after roductSKU = [x.strip(' Model# ') for x in productSKU]

Upvotes: 2

Related Questions