Reputation:
I am extracting a field on a webpage ad the tag html text content looks like this...
35 new
In python the extracted data looks like this...
35\xa0new
How to I deal with unicode in python to convert to a regular string?
"35 new"
what library to I use?
Thanks
Upvotes: 0
Views: 679
Reputation: 799580
Avoid working with regular strings whenever possible; unicode
s are generally more useful for text, and there are many well-known solutions for manipulating and dealing with them.
Upvotes: 3
Reputation: 376082
You are getting unicode strings from the parser. You can replace certain characters if you prefer others. For example, your \xa0
is a non-breaking space, and you can replace it with a regular space:
text = text.replace(u"\xa0", u" ")
There could be many of these characters that you want to change, so it might be a long process of finding all the ones that occur in your data.
Upvotes: 0