Xar
Xar

Reputation: 7940

Difficulties with character-encoding in Python

I am receiving data via GET request parameters. Some of theese parameters are strings, and I'm having a tough time being able to display them correctly due to encoding issues I guess.

This is an example of what I receive:

{'id_origen': u'9', 'apellidos': u'\xd1\xe9rez', 'nombre': u'Pimp\xe1m'}

You can see that the value for the key 'apellidos' isn't being received properly. It appears

u'\xd1\xe9rez'

instead of

Núñez.

I tried to solve this issues in a very primitive way, replaceing each appearance of a character like "\xe1" with "á", for example. But it is giving me problems also. This is the code I came up with:

tabla = {'\xE1':'á', '\xE9':'é', '\xED': 'í', '\xF3':'ó', '\xFA':'ú'}

logger.info ("Valor del argumento antes del bucle de urldecode: %s" % valor)
for k, v in tabla.iteritems():
    if k in valor:
        valor.replace(k, v)

Of course, it doesn't work as I had expected.

What would be the appropiate treatment for theese type of character encoding that I'm receiving?

Upvotes: 0

Views: 85

Answers (2)

hamstergene
hamstergene

Reputation: 24439

The values are received correctly (that particular value is “Ñérez” by the way, not “Núñez”).

When Python dumps contents of list or dict to console, every string item is displayed as representation (the result or repr() function), not the original string. For example:

>>> print [0, u"é", 0]
[0, u'\xe9', 0]

I believe the main point of this is to make values directly reusable by copy-pasting them back into the code. Because strings can contain all kinds of quotes and backslashes, and because terminals/webpages/etc may not be capable of displaying non-ascii, printing unchanged string contents would not do the job.

The real text inside those strings is okay:

>>> print u'\xd1\xe9rez'
Ñérez
>>> 

Upvotes: 2

remram
remram

Reputation: 5203

u'\xd1\xe9rez' doesn't seem to be the string Núñez. but rather Ñérez. Are you sure about what your data is?

Other than that, your data is unicode. There is no encoding with unicode because it's already characters; whatever fix you think is necessary should happen upstream. Is your web framework giving you these values?

Upvotes: 0

Related Questions