LarmadVara
LarmadVara

Reputation: 47

Python decoding weird characters

C�r�monie, how would I decode these characters in python to cérémonie?

 line.encode('utf-8').decode('utf-8')

I've tried to decode it in "latin-1" and "utf-8" but get the same results: C�r�monie. Since "line" is a string I can't decode it directly?

I tried to use and encoding when i opened the file too, but get the same result: C�r�monie

f = open('data/u.item', 'r', encoding='latin-1')
lines = f.readlines()
for line in lines:
    print(line)

Upvotes: 0

Views: 1275

Answers (1)

Edmond de Martimprey
Edmond de Martimprey

Reputation: 21

I use binaire (2.7):

i do that because python was not capable to read my string

example of use :

i separate string (data) in "binaire string"

binaire = ' '.join(format(ord(x), 'b') for x in data)

i find wich char corespond with wich

print binaire

i replace with byte: example

binaire = binaire.replace("11101010", "1100101")  # replace ê by e in my case

i reconverte my string in python string

res = bitstring_to_bytes(binaire)



def bitstring_to_bytes(tab):
      tab = tab.split(" ")
      string = ""
      for t in tab:
            string  = string + bitchar_to_bytes(t)
      return string

def bitchar_to_bytes(s):
    v = int(s, 2)
    b = bytearray()
    while v:
          b.append(v & 0xff)
      v >>= 8
    return bytes(b[::-1])

Upvotes: 1

Related Questions