Reputation: 2489
Given the following file:
b'Clay Regazzoni'
b"Gianclaudio Giuseppe Regazzoni, dit Clay Regazzoni, n\xe9 le \xe0"
b'Lucie de Syracuse'
b'Lucie de Syracuse ou sainte Lucie, vierge et martyre dont le nom est illustr\xe9'
How can I extract and decode each line separately?
Each line was separately encoded using utf-8
, but the file was stored using the default encoding.
My attempt was
open('path','r').readlines()[1].decode('latin1')
which fails (str has no decode attribiute
), as
secondline = 'b"Gianclaudio Giuseppe Regazzoni, dit Clay Regazzoni, n\xe9 le \xe0"'
and not
secondline = b"Gianclaudio Giuseppe Regazzoni, dit Clay Regazzoni, n\xe9 le \xe0"
The desired output is
>>>open('path','r').readlines()[1].decode('latin1')
Gianclaudio Giuseppe Regazzoni, dit Clay Regazzoni, né le à
Upvotes: 0
Views: 47
Reputation: 30113
Apply ast
module as follows:
import ast
with open('x.txt','r') as f:
for line in f.readlines():
if line[0:2] == 'b"' or line[0:2] == "b'":
print(ast.literal_eval(line).decode('latin1'))
else:
print(line)
Output:
Clay Regazzoni Gianclaudio Giuseppe Regazzoni, dit Clay Regazzoni, né le à Lucie de Syracuse Lucie de Syracuse ou sainte Lucie, vierge et martyre dont le nom est illustré
Upvotes: 1