backslash escaped string with accented letters to ascii

Question

I have a string that represents a dict. The format of the string includes characters like "\x7B" (sorry I'm not sure what to call this - backslash encoding?) It also contains accented characters in the form "\u00fa" (again, sorry, I'm not sure what this is called). I would like to:

Change all "\x7B" style characters to their corresponding "normal" characters, here it would be "{"
Change all "\u00fa" style characters to corresponding ascii characters (i.e. no accent), here "\u00fa" becomes "ú" and so would be "u".

It's difficult to tell whats going on behind the scenes as when I print these strings it automatically converts the "\x7B" style characters to their "normal" equivalents.

For example, I would like to be able to convert '\x7B\x22h\x22\x3A\x22Ra\x5Cu00fal\x22\x7D' to '{"h":"Raul"}'

Side note: How would I be able to view '\x7B\x22h\x22\x3A\x22Ra\x5Cu00fal\x22\x7D' as '\x7B\x22h\x22\x3A\x22Ra\x5Cu00fal\x22\x7D' rather than '{"h":"Ra\u00fal"}'?

Also, if you could include the proper names of each of the string formats (encodings?) so I could update the question name to make it more suitable for future reference, that'd be great.

Prem Anand · Accepted Answer

You can use json.loads to convert the encoded string to dict and then normalize it and finally convert it to equivalent ascii

>>> import unicodedata
>>> import json
>>> 
>>> s = '\x7B\x22h\x22\x3A\x22Ra\x5Cu00fal\x22\x7D'
>>> unicodedata.normalize('NFKD', str(json.loads(s))).encode('ascii','ignore').decode()
"{'h': 'Raul'}"

backslash escaped string with accented letters to ascii

Answers (1)

Related Questions