Reputation: 621
I want to clean my Scrapy response. I'm buidling a simple price monitor but I'm having troubles getting clean price.
I get following response:
['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
Ideally, I would like it to be (in float?):
272.28
I'm using scrapy items to store values such as:
def parse_item(self, response):
item = HobbyItem()
item['new_price'] = response.css('span.price.new-price').extract()
item['base_price'] = response.css('span.price.base-price').extract()
Thanks for the help!
Upvotes: 0
Views: 280
Reputation: 621
After all the help this is the solution that worked for me (it is far from being most efficient)
def parse_item(self, response):
item = HobbyItem()
if response.css('span.price.new-price::text').extract():
new_price = response.css('span.price.new-price::text').extract()
new_price_clean = new_price[0]
new_price_clean_strip = new_price_clean.strip()
new_price_clean_euro = new_price_clean_strip.replace("€", "")
final_new_price = float(new_price_clean_euro)
item['new_price'] = final_new_price
else:
item['new_price'] = '0'
if response.css('span.base-price::text').extract():
new_price = response.css('span.base-price::text').extract()
new_price_clean = new_price[0]
new_price_clean_strip = new_price_clean.strip()
new_price_clean_euro = new_price_clean_strip.replace("€", "")
final_new_price = float(new_price_clean_euro)
item['base_price'] = final_new_price
else:
item['base_price'] = '0'
if response.css('span.price::text').extract():
new_price = response.css('span.price::text').extract()
new_price_clean = new_price[0]
new_price_clean_strip = new_price_clean.strip()
new_price_clean_euro = new_price_clean_strip.replace("€", "")
final_new_price = float(new_price_clean_euro)
item['price'] = final_new_price
else:
item['price'] = '0'
item['name'] = response.css('h1>span::text').extract()
item['url'] = response.url
yield item
Upvotes: 0
Reputation: 1236
Because it seems like the text is in a list, so you first need to get the text out of the list and then strip it
>>> response = ['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
>>> text = response[0]
'\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t'
>>> clean_text = text.strip()
'272.28€'
>>> number_text = clean_text.replace("€", "")
'272.28'
>>> number = float(number_text)
272.28
Or as one-liner:
>>> response = ['\n\t\t\t\t\t\t\t\t\t\t\t\t272.28€\t\t\t\t\t\t\t\t\t\t\t']
>>> float(response[0].strip().replace("€", ""))
272.28
Upvotes: 2
Reputation: 343
Use this:
def parse_item(self, response):
item = HobbyItem()
item['new_price'] = response.css('span.price.new-price::text').get().replace('€', '').strip()
item['base_price'] = response.css('span.price.base-price::text').get().replace('€', '').strip()
Here get() method retrieves the first element matching the css and strip method strips the extra characters. You can know more in here
Upvotes: 0