Reputation: 53
I try now for several hours to find a solution for this problem. I need to read in a generated CSV file that has the headers of the columns in a format like this:
"b'Device Name' (b'')"
or even
"b'Bezugsz\\xc3\\xa4hler' (b'Wh')"
I want to convert these strings to Unicode. However, until now I'm out of luck. All examples with encode or decode I found so far didn't lead in a useful direction. I need to get rid of the b'…'
part as well as the \x
escapes.
I hope someone here has some useful information. :)
edit: as requested the desired output:
"Device Name ()"
"Bezugszähler (Wh)"
the first case is easy to achieve with replace(). But I look for a solution for the second case, which would then naturally include the first case.
I tried solutions with ast.literal_eval() but this chokes on the parentheses. Solutions with .encode().decode() also did not work as expected.
Upvotes: 1
Views: 181
Reputation: 33107
Here's a quick and dirty way to do this:
ast.literal_eval()
to convert them to actual bytesheaders = [
"b'Device Name' (b'')",
"b'Bezugsz\\xc3\\xa4hler' (b'Wh')"]
# ---
import ast
import re
def f(string):
faux_bytes = re.findall(r"b'.*?'", string)
real_bytes = [ast.literal_eval(f) for f in faux_bytes]
decoded = [s.decode() for s in real_bytes]
return '{} ({})'.format(*decoded)
result = [f(h) for h in headers]
print(result)
Output:
['Device Name ()', 'Bezugszähler (Wh)']
Upvotes: 1