Reputation: 61
I'm building a scraper to get the product prices from a website.
At the moment I have the following code:
def parse(self, response):
for tank in response.xpath('//html/body/div/div[4]/div/div/div/table[1]/tr/td/div/span/span'):
item = VapeItem()
item["price"] = tank.xpath("text()").extract()
yield item
And here is the json output:
{"price": ["5,00 \u20ac\n \n \n \n \n \n *\n \n \n \n "]},
I've tried encoding("utf-8"), strip, replaces, but nothing seems to work.
My question is: How do I make that output readable. Either make "5.00 €" ( \u20ac) or just "5.00"
Thanks in advance!
Upvotes: 4
Views: 793
Reputation: 180481
Simplest way may be to split once and replace any comma with a decimal:
item["price"] = tank.xpath("text()").extract()[0].split(None,1)[0].replace(",",".")
That will leave you with 5.00
. Because you have a *
in the string strip would not work, you could pass that character to strip i,e [0].rstrip("\n* ")
but if there were other errant chars that would break.
If you want the euro sign too, you can decode('unicode-escape')
:
d={"price": ["5,00 \u20ac\n \n \n \n \n \n *\n \n \n \n "]}
d["price"] = d["price"][0].decode('unicode-escape').rstrip("\n * ").replace(",",".")
print(d["price"])
5.00 €
If you want to combine it with split and keep the sign, also formatting it a bit nicer:
p,s,_ = d["price"][0].split(None, 2)
d["price"] = u"{}{}".format(s.decode("unicode_escape"),p.replace(",","."))
print(d["price"])
Which will give you:
€5.00
Upvotes: 2