trying to understand a Python Unicode exception

Question

I have a string object of unspecified type. It will match types.StringTypes, but could be a type.StringType or type.UnicodeType - I'm not sure what I'll receive. I can't necessarily control what comes in.

My issue occurs when I have a non-ascii character in a string type , and pass the call into misaka ( which is a sundown parser )

in this example, we're dealing with unichr(8250) / u'\u203a' , which caused this a handful of times in my error logs...

a = "›"
b = u"›"

print type(a) # 
print type(b) # 

print a # fine
print b # fine

import misaka

markdown_renderer = misaka.HtmlRenderer()
renderer = misaka.Markdown( markdown_renderer )

try:
    print renderer.render( a )
    #this will fail
    print "GOOD a"
except:
    print "FAILED a"

try:
    print renderer.render( b )
    #this will pass
    print "GOOD b"
except:
    print "FAILED b"

I can't figure out how to turn the "a" object into something that misaka won't have issues with. 'b' always works. can anyone offer a suggestion ?

Daniel Roseman · Accepted Answer

If a str always fails but a unicode always succeeds, you presumably need to decode your str object before passing it in. The trick is knowing the encoding: if you do, you can just do (for example) a.decode('utf-8'). But if you don't know, then I understand the chardet package does a reasonable job of guessing: but note that guessing is all you can do.

trying to understand a Python Unicode exception

Answers (1)

Related Questions