DsCpp
DsCpp

Reputation: 2489

Read partially encoded file (each line encoded separately)

Given the following file:

b'Clay Regazzoni' 
b"Gianclaudio Giuseppe Regazzoni, dit Clay Regazzoni, n\xe9 le \xe0"

b'Lucie de Syracuse' 
b'Lucie de Syracuse ou sainte Lucie, vierge et martyre dont le nom est illustr\xe9'

How can I extract and decode each line separately? Each line was separately encoded using utf-8, but the file was stored using the default encoding.

My attempt was

open('path','r').readlines()[1].decode('latin1')

which fails (str has no decode attribiute), as

secondline = 'b"Gianclaudio Giuseppe Regazzoni, dit Clay Regazzoni, n\xe9 le \xe0"'
and not 
secondline = b"Gianclaudio Giuseppe Regazzoni, dit Clay Regazzoni, n\xe9 le \xe0"

The desired output is

>>>open('path','r').readlines()[1].decode('latin1')
Gianclaudio Giuseppe Regazzoni, dit Clay Regazzoni, né le à 

Upvotes: 0

Views: 47

Answers (1)

JosefZ
JosefZ

Reputation: 30113

Apply ast module as follows:

import ast
with open('x.txt','r') as f:
    for line in f.readlines():
        if line[0:2] == 'b"' or line[0:2] == "b'":
            print(ast.literal_eval(line).decode('latin1'))
        else:
            print(line)

Output:

Clay Regazzoni
Gianclaudio Giuseppe Regazzoni, dit Clay Regazzoni, né le à


Lucie de Syracuse
Lucie de Syracuse ou sainte Lucie, vierge et martyre dont le nom est illustré

Upvotes: 1

Related Questions