Reputation: 343
I have a string that represents a dict. The format of the string includes characters like "\x7B" (sorry I'm not sure what to call this - backslash encoding?) It also contains accented characters in the form "\u00fa" (again, sorry, I'm not sure what this is called). I would like to:
It's difficult to tell whats going on behind the scenes as when I print these strings it automatically converts the "\x7B" style characters to their "normal" equivalents.
For example, I would like to be able to convert '\x7B\x22h\x22\x3A\x22Ra\x5Cu00fal\x22\x7D'
to '{"h":"Raul"}'
Side note:
How would I be able to view '\x7B\x22h\x22\x3A\x22Ra\x5Cu00fal\x22\x7D'
as '\x7B\x22h\x22\x3A\x22Ra\x5Cu00fal\x22\x7D'
rather than '{"h":"Ra\\u00fal"}'
?
Also, if you could include the proper names of each of the string formats (encodings?) so I could update the question name to make it more suitable for future reference, that'd be great.
Upvotes: 2
Views: 946
Reputation: 2537
You can use json.loads
to convert the encoded string to dict and then normalize it and finally convert it to equivalent ascii
>>> import unicodedata
>>> import json
>>>
>>> s = '\x7B\x22h\x22\x3A\x22Ra\x5Cu00fal\x22\x7D'
>>> unicodedata.normalize('NFKD', str(json.loads(s))).encode('ascii','ignore').decode()
"{'h': 'Raul'}"
Upvotes: 3