Reputation: 385
I am a newbie in python.
I have a unicode in Tamil.
When I use the sys.getdefaultencoding() I get the output as "Cp1252"
My requirement is that when I use text = testString.decode("utf-8") I get the error "UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-8: character maps to undefined"
Upvotes: 0
Views: 929
Reputation: 474
you need to know which character-encoding is testString using. if not utf8, an error will occur when using decode('utf8').
Upvotes: 0
Reputation: 2453
add this as your 1st line of code
# -*- coding: utf-8 -*-
later in your code...
text = unicode(testString,"UTF-8")
Upvotes: 0
Reputation: 83032
When I use the sys.getdefaultencoding() I get the output as "Cp1252"
Two comments on that: (1) it's "cp1252", not "Cp1252". Don't type from memory. (2) Whoever caused sys.getdefaultencoding() to produce "cp1252" should be told politely that that's not a very good idea.
As for the rest, let me guess. You have a unicode
object that contains some text in the Tamil language. You try, erroneously, to decode it. Decode means to convert from a str
object to a unicode
object. Unfortunately you don't have a str
object, and even more unfortunately you get bounced by one of the very few awkish/perlish warts in Python 2: it tries to make a str
object by encoding your unicode
string using the system default encoding. If that's 'ascii' or 'cp1252', encoding will fail. That's why you get a Unicode*En*codeError instead of a Unicode*De*codeError.
Short answer: do text = testString.encode("utf-8")
, if that's what you really want to do. Otherwise please explain what you want to do, and show us the result of print repr(testString)
.
Upvotes: 3