Ghilas BELHADJ
Ghilas BELHADJ

Reputation: 14086

Weird behaviour when trying to print characters of a byte string

Why this short code behaves differently from a run to other?

# -*- coding: utf-8 -*-
for c in 'aɣyul':
    print c

The outputs that I have in each run are:

# nothing
---
a
---
l
---
u
l
---
a
y
u
l
...etc

I know how to solve the problem, the question is just why Python prints a different part of the string, instead of the same part, at each run?

Upvotes: 1

Views: 126

Answers (1)

Kasravnd
Kasravnd

Reputation: 107287

You need to add an u at leading of your string which make that python treads with your string as an unicode, and decode your character while printing:

>>> for c in u'aɣyul':
...     print c
... 
a
ɣ
y
u
l

Note that without encoding python will break the unicode character in two separate hex value and in each print you will get the string representation of this hex values:

>>> 'aɣyul'
'a\xc9\xa3yul'
    ^   ^

If you want to know that why python break the unicode to 2 hex value that's because of that instances of str contain raw 8-bit values while a unicode character used more than 8 bit memory.

You can also decode the hex values manually:

>>> print '\xc9\xa3'.decode('utf8')
ɣ

Upvotes: 1

Related Questions