Reputation: 385

Conversion of Unicode

I am a newbie in python.

I have a unicode in Tamil.

When I use the sys.getdefaultencoding() I get the output as "Cp1252"

My requirement is that when I use text = testString.decode("utf-8") I get the error "UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-8: character maps to undefined"

Upvotes: 0

Answers (3)

Eric

Reputation: 474

you need to know which character-encoding is testString using. if not utf8, an error will occur when using decode('utf8').

Upvotes: 0

pahan

Reputation: 2453

add this as your 1st line of code

# -*- coding: utf-8 -*-

later in your code...

text = unicode(testString,"UTF-8")

Upvotes: 0

John Machin

Reputation: 83032

When I use the sys.getdefaultencoding() I get the output as "Cp1252"

Two comments on that: (1) it's "cp1252", not "Cp1252". Don't type from memory. (2) Whoever caused sys.getdefaultencoding() to produce "cp1252" should be told politely that that's not a very good idea.

As for the rest, let me guess. You have a unicode object that contains some text in the Tamil language. You try, erroneously, to decode it. Decode means to convert from a str object to a unicode object. Unfortunately you don't have a str object, and even more unfortunately you get bounced by one of the very few awkish/perlish warts in Python 2: it tries to make a str object by encoding your unicode string using the system default encoding. If that's 'ascii' or 'cp1252', encoding will fail. That's why you get a Unicode*En*codeError instead of a Unicode*De*codeError.

Short answer: do text = testString.encode("utf-8"), if that's what you really want to do. Otherwise please explain what you want to do, and show us the result of print repr(testString).

Upvotes: 3

Conversion of Unicode

Answers (3)

Related Questions