Encoding and Decoding of text in Python

Question

I am currently working with a python script (appengine) that takes an input from the user (text) and stores it in the database for re-distribution later.

The text that comes in is unknown, in terms of encoding and I need to have it encoded only once.

Example Texts from clients:

This%20is%20a%20test
This is a test

Now in python what I thought I could do is decode it then encode it so both samples become:

This%20is%20a%20test
This%20is%20a%20test

The code that I am using is as follows:

#
# Dencode as UTF-8
#
pl = pl.encode('UTF-8')

#
#Unquote the string, then requote to assure encoding
#
pl = urllib.quote(urllib.unquote(pl))

Where pl is from the POST parameter for payload.

The Issue

The issue is that sometimes I get special (Chinese, Arabic) type chars and I get the following error.

'ascii' codec can't encode character u'\xc3' in position 0: ordinal not in range(128)
    ..snip..
    return codecs.utf_8_decode(input, errors, True)
 UnicodeEncodeError: 'ascii' codec can't encode character u'\xc3' in position 0: ordinal not in range(128)

does anyone know the best solution to process the string given the above issue?

Thanks.

Encoding and Decoding of text in Python

The Issue

Answers (1)

Related Questions