Reputation: 15680
I have a string object of unspecified type. It will match types.StringTypes, but could be a type.StringType or type.UnicodeType - I'm not sure what I'll receive. I can't necessarily control what comes in.
My issue occurs when I have a non-ascii character in a string type , and pass the call into misaka ( which is a sundown parser )
in this example, we're dealing with unichr(8250) / u'\u203a' , which caused this a handful of times in my error logs...
a = "›"
b = u"›"
print type(a) # <type 'str'>
print type(b) # <type 'unicode'>
print a # fine
print b # fine
import misaka
markdown_renderer = misaka.HtmlRenderer()
renderer = misaka.Markdown( markdown_renderer )
try:
print renderer.render( a )
#this will fail
print "GOOD a"
except:
print "FAILED a"
try:
print renderer.render( b )
#this will pass
print "GOOD b"
except:
print "FAILED b"
I can't figure out how to turn the "a" object into something that misaka won't have issues with. 'b' always works. can anyone offer a suggestion ?
Upvotes: 1
Views: 226
Reputation: 599866
If a str
always fails but a unicode
always succeeds, you presumably need to decode your str
object before passing it in. The trick is knowing the encoding: if you do, you can just do (for example) a.decode('utf-8')
. But if you don't know, then I understand the chardet package does a reasonable job of guessing: but note that guessing is all you can do.
Upvotes: 2