Reputation: 776
I have the following string:
word = u'Buffalo,\xa0IL\xa060625'
I don't want the "\xa0" in there. How can I get rid of it? The string I want is:
word = 'Buffalo, IL 06025
Upvotes: 23
Views: 73041
Reputation: 3235
You can easily use unicodedata
to get rid of all of \x...
characters.
from unicodedata import normalize
normalize('NFKD', word)
>>> 'Buffalo, IL 60625'
Upvotes: 10
Reputation: 308452
The most robust way would be to use the unidecode
module to convert all non-ASCII characters to their closest ASCII equivalent automatically.
The character \xa0
(not \xa
as you stated) is a NO-BREAK SPACE, and the closest ASCII equivalent would of course be a regular space.
import unidecode
word = unidecode.unidecode(word)
Upvotes: 32
Reputation: 365955
There is no \xa
there. If you try to put that into a string literal, you're going to get a syntax error if you're lucky, or it's going to swallow up the next attempted character if you're not, because \x
sequences aways have to be followed by two hexadecimal digits.
What you have is \xa0
, which is an escape sequence for the character U+00A0, aka "NO-BREAK SPACE".
I think you want to replace them with spaces, but whatever you want to do is pretty easy to write:
word.replace(u'\xa0', u' ') # replaced with space
word.replace(u'\xa0', u'0') # closest to what you were literally asking for
word.replace(u'\xa0', u'') # removed completely
Upvotes: 8
Reputation: 59185
This seems to work for getting rid of non-ascii characters:
fixedword = word.encode('ascii','ignore')
Upvotes: 3
Reputation: 310097
If you know for sure that is the only character you don't want, you can .replace
it:
>>> word.replace(u'\xa0', ' ')
u'Buffalo, IL 60625'
If you need to handle all non-ascii characters, encoding and replacing bad characters might be a good start...:
>>> word.encode('ascii', 'replace')
'Buffalo,?IL?60625'
Upvotes: 11