kaalus
kaalus

Reputation: 4574

Cannot quote unicode string in Python

I have a unicode object userId that contains a non-ascii character. I am trying to quote this string to use it as an XML attribute using a function from xml.sax.saxutils:

quoteattr(userId)

which gives me this error:

'ascii' codec can't encode character u'\xa0'

I think I have read all the python unicode information on the web, including https://docs.python.org/2/howto/unicode.html#the-unicode-type

but I still do not understand what the problem is. I already have a unicode object. I don't care about encodings. Encodings are when I want to convert from unicode to byte array or vice versa. Never in my code I am dealing with raw byte arrays.

Basically the big question is why does quoteattr want to encode something using ascii encoding if I gave it a unicode object and expect unicode object back?

I worked around the problem by doing userId.encode('ascii', 'ignore'), but this obviously throws away any non-ascii characters.

How can I get my unicode string quoted?

The variable is assigned with userId = ndb.StringProperty() using Google App Engine.

Upvotes: 1

Views: 310

Answers (1)

Renzo
Renzo

Reputation: 126

Once you mentioned Google App Engine I played with an example using it:

from xml.sax.saxutils import quoteattr
from google.appengine.ext import ndb
from google.appengine.ext.ndb.model import Model


class Foo(Model):
    bar=ndb.StringProperty()


foo=Foo(bar='''barç"á<&'  >


''')

print type(foo.bar)

print quoteattr(foo.bar)

The problem here is that foo.bar is an str, so you are going to have the encoding problem. Thera two ways of solving it:

1) Using u prefix. So

foo=Foo(bar='''barç"á<&'  >


    ''')

Becomes

foo=Foo(bar=u'''barç"á<&'  >


    ''')

2) Adding the two lines on the beginning of your scripts:

# -*- coding: utf-8 -*-
from __future__ import absolute_import, unicode_literals

I prefer the second approach. I configure Pycharm to add this lines on every new created py file.

Note that this problem only occurred when setting model property with a literal. Webapp2 and most of frameworks used o GAE converts request data to unicode so you don't have to bother about encoding/decoding.

Upvotes: 2

Related Questions