yKim
yKim

Reputation: 63

os.listdir() return weird Korean value (encoding issue)

I have folders and its name includes few Korean characters.

When I read list of folder name by os.listdir, its name value is purely different with normal string.

Example:

What makes difference? We can estimate it is from os.listdir() gives confusing with some encoding..

Upvotes: 1

Views: 157

Answers (1)

Karl Knechtel
Karl Knechtel

Reputation: 61654

Both of these are the same encoding (UTF-8), but...

"누" = (\xe1\x84\x82\xe1\x85\xae)

This represents the character as composed of the two jamo (the 24 building blocks of the Korean (hangeul) alphabet):

>>> import unicodedata
>>> x = b'\xe1\x84\x82'.decode('utf-8')
>>> y = b'\xe1\x85\xae'.decode('utf-8')
>>> unicodedata.name(x)
'HANGUL CHOSEONG NIEUN'
>>> unicodedata.name(y)
'HANGUL JUNGSEONG U'

"누" in python console = (\xeb\x88\x84)

Whereas when you actually type the character in a console window, you (apparently) get a precomposed character:

>>> z = b'\xeb\x88\x84'.decode('utf-8')
>>> unicodedata.name(z)
'HANGUL SYLLABLE NU'

Upvotes: 2

Related Questions