pmiln099
pmiln099

Reputation: 1

Insert whitespace between non-ascii characters in python

I am creating a dictionary that requires each letter of a string separated by whitespace. I am using join. The problem is when the string contains non-ascii characters. Join breaks them into two characters and the results is garbage.

Example:

>>> word = 'məsjø'
>>> ' '.join(word)

Gives me:

'm \xc9 \x99 s j \xc3 \xb8'

When what I want is:

'm ə s j ø'

Or even:

'm \xc9\x99 s j \xc3\xb8'

Upvotes: 0

Views: 1767

Answers (1)

Jan Pöschko
Jan Pöschko

Reputation: 5580

You should use unicode strings, i.e.

word = u'məsjø'

And don't forget to set the encoding of your Python source file at the beginning with

# -*- coding: UTF-8 -*-

(Don't even think about using something other than UTF-8. ;))

Update: This only applies to Python < 3. If you're using Python >= 3, you would probably not have run into these problems in the first place. So if upgrading to 3.x is an option, it's the way to go -- it might not be in some cases because of library dependencies etc., unfortunately.

As mentioned in the comments, encoding issues might also result from a differently configured terminal, although that was not the problem here, apparently.

Upvotes: 3

Related Questions