Reputation: 734
I have this string:
V posledn\u00edch m\u011bs\u00edc\u00edch se bezpe\u010dnostn\u00ed situace v Libyi zna\u010dn\u011b zhor\u0161ila, o \u010dem\u017e sv\u011bd\u010d\u00ed i ned\u00e1vn\u00e9 n\u00e1hl\u00e9 opu\u0161t\u011bn\u00ed zem\u011b nejen \u010desk\u00fdmi diplomaty. Libyi hroz\u00ed nekontrolovan\u00fd rozpad a nekone\u010d
Which should read "V posledních měsících se ..." so \u00ed is í and \u011b is ě.
Any idea how to decode this in Python? It is a javascript code I am parsing in python. I could write my own ad-hoc solution as there are not that many characters that are escaped (there are only twelve or so accented characters in Czech), but that seems ugly.
Upvotes: 5
Views: 6126
Reputation: 251578
Decode it using the 'unicode-escape'
codec. If x
is your string, x.decode('unicode-escape')
.
Upvotes: 11
Reputation: 390
I had a similar issue, was solved by:
unicodedata.normalize('NFD', my_string.decode('unicode-escape')).encode('ascii','ignore')
Upvotes: 0
Reputation: 376012
If it is Javascript code, then perhaps it's actually JSON, and you can use json.loads
to decode it.
Upvotes: 1