Kemian
Kemian

Reputation: 23

Using f-strings with unicode escapes

I've got strings that look something like this: a = "testing test<U+00FA>ing <U+00F3>"

Format will not always be like that, but those unicode characters in brackets will be scattered throughout the code. I want to turn those into the actual unicode characters they represent. I tried this function:

def replace_unicode(s):
    uni = re.findall(r'<U\+\w\w\w\w>', s)

    for a in uni:
        s = s.replace(a, f'\u{a[3:7]}')
    return s

This successfully finds all of the <U+> unicode strings, but it won't let me put them together to create a unicode escape in this manner.

  File "D:/Programming/tests/test.py", line 8
    s = s.replace(a, f'\u{a[3:7]}')
                     ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

How can I create a unicode escape character using an f-string, or via some other method with the information I'm getting from strings?

Upvotes: 2

Views: 4874

Answers (2)

wjandrea
wjandrea

Reputation: 32954

chepner's answer is good, but you don't actually need an f-string. int(a[3:7], base=16) works perfectly fine.

Also, it would make a lot more sense to use re.sub() instead of re.findall() then str.replace(). I would also restrict the regex down to just hex digits and group them.

import re

def replace_unicode(s):
    pattern = re.compile(r'<U\+([0-9A-F]{4})>')
    return pattern.sub(lambda match: chr(int(match.group(1), base=16)), s)

a = "testing test<U+00FA>ing <U+00F3>"
print(replace_unicode(a))  # -> testing testúing ó

Upvotes: 4

chepner
chepner

Reputation: 531055

You can use an f-string to create an appropriate argument to int, whose result the chr function can use to produce the desired character.

for a in uni:
    s = s.replace(a, chr(int(f'0x{a[3:7]}', base=16)))

Upvotes: 1

Related Questions