Tajs
Tajs

Reputation: 621

Strip \t from beginning and end of Scrapy response

I want to clean my Scrapy response. I'm buidling a simple price monitor but I'm having troubles getting clean price.

I get following response:

['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']

Ideally, I would like it to be (in float?):

272.28

I'm using scrapy items to store values such as:

def parse_item(self, response):
    item = HobbyItem()
    item['new_price'] = response.css('span.price.new-price').extract()
    item['base_price'] = response.css('span.price.base-price').extract()

Thanks for the help!

Upvotes: 0

Views: 280

Answers (3)

Tajs
Tajs

Reputation: 621

After all the help this is the solution that worked for me (it is far from being most efficient)

def parse_item(self, response):
    item = HobbyItem()
    if response.css('span.price.new-price::text').extract():
        new_price = response.css('span.price.new-price::text').extract()
        new_price_clean = new_price[0]
        new_price_clean_strip = new_price_clean.strip()
        new_price_clean_euro = new_price_clean_strip.replace("€", "")
        final_new_price = float(new_price_clean_euro)
        item['new_price'] = final_new_price
    else:
        item['new_price'] = '0'
    if response.css('span.base-price::text').extract():
        new_price = response.css('span.base-price::text').extract()
        new_price_clean = new_price[0]
        new_price_clean_strip = new_price_clean.strip()
        new_price_clean_euro = new_price_clean_strip.replace("€", "")
        final_new_price = float(new_price_clean_euro)
        item['base_price'] = final_new_price
    else:
        item['base_price'] = '0'
    if response.css('span.price::text').extract():
        new_price = response.css('span.price::text').extract()
        new_price_clean = new_price[0]
        new_price_clean_strip = new_price_clean.strip()
        new_price_clean_euro = new_price_clean_strip.replace("€", "")
        final_new_price = float(new_price_clean_euro)
        item['price'] = final_new_price
    else:
        item['price'] = '0'
    item['name'] = response.css('h1>span::text').extract()
    item['url'] = response.url
    yield item

Upvotes: 0

Uli Sotschok
Uli Sotschok

Reputation: 1236

Because it seems like the text is in a list, so you first need to get the text out of the list and then strip it

>>> response = ['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
>>> text = response[0]
'\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t'
>>> clean_text = text.strip()
'272.28€'
>>> number_text = clean_text.replace("€", "")
'272.28'
>>> number = float(number_text)
272.28

Or as one-liner:

>>> response = ['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
>>> float(response[0].strip().replace("€", ""))
272.28

Upvotes: 2

Asiful Nobel
Asiful Nobel

Reputation: 343

Use this:

def parse_item(self, response):
   item = HobbyItem()
   item['new_price'] = response.css('span.price.new-price::text').get().replace('€', '').strip()
   item['base_price'] = response.css('span.price.base-price::text').get().replace('€', '').strip()

Here get() method retrieves the first element matching the css and strip method strips the extra characters. You can know more in here

Upvotes: 0

Related Questions