Reputation: 4574
I have a unicode object userId
that contains a non-ascii character. I am trying to quote this string to use it as an XML attribute using a function from xml.sax.saxutils:
quoteattr(userId)
which gives me this error:
'ascii' codec can't encode character u'\xa0'
I think I have read all the python unicode information on the web, including https://docs.python.org/2/howto/unicode.html#the-unicode-type
but I still do not understand what the problem is. I already have a unicode object. I don't care about encodings. Encodings are when I want to convert from unicode to byte array or vice versa. Never in my code I am dealing with raw byte arrays.
Basically the big question is why does quoteattr
want to encode something using ascii encoding if I gave it a unicode object and expect unicode object back?
I worked around the problem by doing userId.encode('ascii', 'ignore')
, but this obviously throws away any non-ascii characters.
How can I get my unicode string quoted?
The variable is assigned with userId = ndb.StringProperty()
using Google App Engine.
Upvotes: 1
Views: 310
Reputation: 126
Once you mentioned Google App Engine I played with an example using it:
from xml.sax.saxutils import quoteattr
from google.appengine.ext import ndb
from google.appengine.ext.ndb.model import Model
class Foo(Model):
bar=ndb.StringProperty()
foo=Foo(bar='''barç"á<&' >
''')
print type(foo.bar)
print quoteattr(foo.bar)
The problem here is that foo.bar is an str, so you are going to have the encoding problem. Thera two ways of solving it:
1) Using u prefix. So
foo=Foo(bar='''barç"á<&' >
''')
Becomes
foo=Foo(bar=u'''barç"á<&' >
''')
2) Adding the two lines on the beginning of your scripts:
# -*- coding: utf-8 -*-
from __future__ import absolute_import, unicode_literals
I prefer the second approach. I configure Pycharm to add this lines on every new created py file.
Note that this problem only occurred when setting model property with a literal. Webapp2 and most of frameworks used o GAE converts request data to unicode so you don't have to bother about encoding/decoding.
Upvotes: 2