Simon
Simon

Reputation: 520

Python write unicode characters wrong

I'm having some problem with getting Python to handle my unicode text correctly.

I've boiled it down to the following:

>>>print 'Høst'
Høst
>>>print u'Høst'
HÃ,st
>>>u = u'Høst'
>>>u
u'H\xf8st'

sys.stdout.encoding says that it's using UTF-8, which is most likely why the first, non-unicode, print works. If I just need to print something, then this would be fine. However I'm constructing an xml document, from data in a SQL Server and then it really need to be real unicode.

My data looks like it's perfectly good unicode data, u'H\xf8st' look right to me, so why does Python keep outputting it as 'HÃ,st'?

Upvotes: 2

Views: 1643

Answers (2)

Mikhail Korobov
Mikhail Korobov

Reputation: 22238

Are you using ipython? Its unicode support is broken and I'm able to reproduce your output with ipython. Try your example in standard python shell.

Upvotes: 0

jd.
jd.

Reputation: 10958

ø is \xc3\xb8 in ISO-8859-1. \xc3\xb8 is also UTF-8 for the Unicode 00F8 character (ø). Maybe your console really accepts ISO-8859-1 rather than UTF-8 as input, meaning that sys.stdout.encoding is wrong.

Upvotes: 3

Related Questions