reynoldsnlp
reynoldsnlp

Reputation: 1210

python3: Unescape unicode escapes surrounded by unescaped characters

I received json data that has some unicode characters escaped, and others not.

>>> example = r'сло\u0301во'

What is the best way to unescape those characters? In the example below, what would the function unescape look like? Is there a built-in function that does this?

>>> unescape(example)
сло́во

Upvotes: 1

Views: 1474

Answers (1)

reynoldsnlp
reynoldsnlp

Reputation: 1210

This solution assumes that every instance of \u in the original string is a unicode escape:

def unescape(in_str):
    """Unicode-unescape string with only some characters escaped."""
    in_str = in_str.encode('unicode-escape')   # bytes with all chars escaped (the original escapes have the backslash escaped)
    in_str = in_str.replace(b'\\\\u', b'\\u')  # unescape the \
    in_str = in_str.decode('unicode-escape')   # unescape unicode
    return in_str

...or in one line...

def unescape(in_str):
    """Unicode-unescape string with only some characters escaped."""
    return in_str.encode('unicode-escape').replace(b'\\\\u', b'\\u').decode('unicode-escape')

Upvotes: 1

Related Questions