slopeofhope
slopeofhope

Reputation: 776

How to remove this \xa0 from a string in python?

I have the following string:

 word = u'Buffalo,\xa0IL\xa060625'

I don't want the "\xa0" in there. How can I get rid of it? The string I want is:

word = 'Buffalo, IL 06025

Upvotes: 23

Views: 73041

Answers (5)

Amir Imani
Amir Imani

Reputation: 3235

You can easily use unicodedata to get rid of all of \x... characters.

from unicodedata import normalize
normalize('NFKD', word)
>>> 'Buffalo, IL 60625'

Upvotes: 10

Mark Ransom
Mark Ransom

Reputation: 308452

The most robust way would be to use the unidecode module to convert all non-ASCII characters to their closest ASCII equivalent automatically.

The character \xa0 (not \xa as you stated) is a NO-BREAK SPACE, and the closest ASCII equivalent would of course be a regular space.

import unidecode
word = unidecode.unidecode(word)

Upvotes: 32

abarnert
abarnert

Reputation: 365955

There is no \xa there. If you try to put that into a string literal, you're going to get a syntax error if you're lucky, or it's going to swallow up the next attempted character if you're not, because \x sequences aways have to be followed by two hexadecimal digits.

What you have is \xa0, which is an escape sequence for the character U+00A0, aka "NO-BREAK SPACE".

I think you want to replace them with spaces, but whatever you want to do is pretty easy to write:

word.replace(u'\xa0', u' ') # replaced with space
word.replace(u'\xa0', u'0') # closest to what you were literally asking for
word.replace(u'\xa0', u'')  # removed completely

Upvotes: 8

khelwood
khelwood

Reputation: 59185

This seems to work for getting rid of non-ascii characters:

fixedword = word.encode('ascii','ignore')

Upvotes: 3

mgilson
mgilson

Reputation: 310097

If you know for sure that is the only character you don't want, you can .replace it:

>>> word.replace(u'\xa0', ' ')
u'Buffalo, IL 60625'

If you need to handle all non-ascii characters, encoding and replacing bad characters might be a good start...:

>>> word.encode('ascii', 'replace')
'Buffalo,?IL?60625'

Upvotes: 11

Related Questions