How to decode partially escaped unicode string in python (mixed unicode and escaped unicode)?

Question

Given the following string:

str = "\u20ac €"

How to decode it into € €?

Using str.encode("utf-8").decode("unicode-escape") returns € â\x82¬

(To clarify, I am looking for a general solution how to decode any mix of unicode and escaped characters)

Mark Tolonen · Accepted Answer

A simple and fast solution is to use re.sub to match \u and exactly four hexadecimal digits, and convert those digits into a Unicode code point:

import re

s = r"blah bl\uah \u20ac € b\u20aclah\u12blah blah"
print(s)

s = re.sub(r'\u([0-9a-fA-F]{4})',lambda m: chr(int(m.group(1),16)),s)
print(s)

Output:

blah bl\uah \u20ac € b\u20aclah\u12blah blah
blah bl\uah € € b€lah\u12blah blah

How to decode partially escaped unicode string in python (mixed unicode and escaped unicode)?

Answers (2)

Related Questions