sup
sup

Reputation: 734

How do I decode escaped unicode javascript code in Python?

I have this string:

V posledn\u00edch m\u011bs\u00edc\u00edch se bezpe\u010dnostn\u00ed situace v Libyi zna\u010dn\u011b zhor\u0161ila, o \u010dem\u017e sv\u011bd\u010d\u00ed i ned\u00e1vn\u00e9 n\u00e1hl\u00e9 opu\u0161t\u011bn\u00ed zem\u011b nejen \u010desk\u00fdmi diplomaty. Libyi hroz\u00ed nekontrolovan\u00fd rozpad a nekone\u010d

Which should read "V posledních měsících se ..." so \u00ed is í and \u011b is ě.

Any idea how to decode this in Python? It is a javascript code I am parsing in python. I could write my own ad-hoc solution as there are not that many characters that are escaped (there are only twelve or so accented characters in Czech), but that seems ugly.

Upvotes: 5

Views: 6126

Answers (3)

BrenBarn
BrenBarn

Reputation: 251578

Decode it using the 'unicode-escape' codec. If x is your string, x.decode('unicode-escape').

Upvotes: 11

Patrick Sampaio
Patrick Sampaio

Reputation: 390

I had a similar issue, was solved by:

unicodedata.normalize('NFD', my_string.decode('unicode-escape')).encode('ascii','ignore')

Upvotes: 0

Ned Batchelder
Ned Batchelder

Reputation: 376012

If it is Javascript code, then perhaps it's actually JSON, and you can use json.loads to decode it.

Upvotes: 1

Related Questions