user984003
user984003

Reputation: 29569

Python: replace nonbreaking space in Unicode

In Python, I have a text that is Unicode-encoded. This text contains non-breaking spaces, which I want to convert to 'x'. Non-breaking spaces are equal to chr(160). I have the following code, which works great when I run it as Django via Eclipse using Localhost. No errors and any non-breaking spaces are converted.

my_text = u"hello"
my_new_text = my_text.replace(chr(160), "x")

However when I run it any other way (Python command line, Django via runserver instead of Eclipse) I get an error:

'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128)

I guess this error makes sense because it's trying to compare Unicode (my_text) to something that isn't Unicode. My questions are:

  1. If chr(160) isn't Unicode, what is it?
  2. How come this works when I run it from Eclipse? Understanding this would help me determine if I need to change other parts of my code. I have been testing my code from Eclipse.
  3. (most important) How do I solve my original problem of removing the non-breaking spaces? my_text is definitely going to be Unicode.

Upvotes: 6

Views: 12143

Answers (1)

Fred Foo
Fred Foo

Reputation: 363727

  1. In Python 2, chr(160) is a byte string of length one whose only byte has value 160, or hex a0. There's no meaning attached to it except in the context of a specific encoding.
  2. I'm not familiar with Eclipse, but it may be playing encoding tricks of its own.
  3. If you want the Unicode character NO-BREAK SPACE, i.e. code point 160, that's unichr(160).

E.g.,

>>> u"hello\u00a0world".replace(unichr(160), "X")
u'helloXworld

Upvotes: 11

Related Questions