user278618
user278618

Reputation: 20262

Cannot convert ascii to utf-8 in python

I have polish word "wąż" which means "snake"

but I get it from webservice in ascii, so :

snake_in_polish_in_ascii="w\xc4\x85\xc5\xbc"

There are results of my trying:

print str(snake_in_polish_in_ascii) #this prints me w─ů┼╝

snake_in_polish_in_ascii.decode('utf-8')
print str(snake_in_polish_in_ascii) #this prints me w─ů┼╝ too

and this code:

print  str(snake_in_polish_in_ascii.encode('utf-8'))

raises exception:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 1: ordinal not in range(128)

I'm using Wing Ide, at Windows Xp with polish culture.

At top of file I have:

# -*- coding: utf-8 -*-

I can't find a way to resolve it. Why I can't get "wąż" in output?

Upvotes: 1

Views: 54565

Answers (4)

Kalyan Pendyala
Kalyan Pendyala

Reputation: 111

by default python source files are treated as encoded in UTF8 inspite of the fact that standard library of python only used ASCII

Upvotes: 0

Willem
Willem

Reputation: 1334

Example:

snake_in_polish_in_ascii = 'w\xc4\x85\xc5\xbc'
print snake_in_polish_in_ascii.decode('cp1252').encode('utf-8')

Upvotes: 0

mouad
mouad

Reputation: 70059

This expression:

snake_in_polish_in_ascii.decode('utf-8')

don't change the string in place try like this:

print snake_in_polish_in_ascii.decode('utf-8')

About the reason of why when you do print snake_in_polish_in_ascii you see w─ů┼╝ is because your terminal use the cp852 encoding (Central and Eastern Europe) try like this to see:

>>> print snake_in_polish_in_ascii.decode("cp852")
w─ů┼╝

Upvotes: 8

Vladimir Keleshev
Vladimir Keleshev

Reputation: 14285

>>> i="w\xc4\x85\xc5\xbc"
>>> print i.decode('utf-8')
wąż

Upvotes: 5

Related Questions