Convert string containing "b'…'" to unicode

Question

I try now for several hours to find a solution for this problem. I need to read in a generated CSV file that has the headers of the columns in a format like this:

"b'Device Name' (b'')"

or even

"b'Bezugsz\xc3\xa4hler' (b'Wh')"

I want to convert these strings to Unicode. However, until now I'm out of luck. All examples with encode or decode I found so far didn't lead in a useful direction. I need to get rid of the b'…' part as well as the \x escapes.

I hope someone here has some useful information. :)

edit: as requested the desired output:

"Device Name ()"
"Bezugszähler (Wh)"

the first case is easy to achieve with replace(). But I look for a solution for the second case, which would then naturally include the first case.

I tried solutions with ast.literal_eval() but this chokes on the parentheses. Solutions with .encode().decode() also did not work as expected.

wjandrea · Accepted Answer

Here's a quick and dirty way to do this:

Use regex to find the faux-bytes
Use ast.literal_eval() to convert them to actual bytes
Decode bytes to strings
Insert back into a template

headers = [
    "b'Device Name' (b'')",
    "b'Bezugsz\xc3\xa4hler' (b'Wh')"]

# ---

import ast
import re

def f(string):
    faux_bytes = re.findall(r"b'.*?'", string)
    real_bytes = [ast.literal_eval(f) for f in faux_bytes]
    decoded = [s.decode() for s in real_bytes]
    return '{} ({})'.format(*decoded)

result = [f(h) for h in headers]
print(result)

Output:

['Device Name ()', 'Bezugszähler (Wh)']

Convert string containing "b'…'" to unicode

Answers (1)

Related Questions

Convert string containing &quot;b&#39;…&#39;&quot; to unicode

Answers (1)

Related Questions

Convert string containing "b'…'" to unicode