Reputation: 47
C�r�monie, how would I decode these characters in python to cérémonie?
line.encode('utf-8').decode('utf-8')
I've tried to decode it in "latin-1" and "utf-8" but get the same results: C�r�monie. Since "line" is a string I can't decode it directly?
I tried to use and encoding when i opened the file too, but get the same result: C�r�monie
f = open('data/u.item', 'r', encoding='latin-1')
lines = f.readlines()
for line in lines:
print(line)
Upvotes: 0
Views: 1275
Reputation: 21
I use binaire (2.7):
i do that because python was not capable to read my string
example of use :
i separate string (data) in "binaire string"
binaire = ' '.join(format(ord(x), 'b') for x in data)
i find wich char corespond with wich
print binaire
i replace with byte: example
binaire = binaire.replace("11101010", "1100101") # replace ê by e in my case
i reconverte my string in python string
res = bitstring_to_bytes(binaire)
def bitstring_to_bytes(tab):
tab = tab.split(" ")
string = ""
for t in tab:
string = string + bitchar_to_bytes(t)
return string
def bitchar_to_bytes(s):
v = int(s, 2)
b = bytearray()
while v:
b.append(v & 0xff)
v >>= 8
return bytes(b[::-1])
Upvotes: 1