Jean David
Jean David

Reputation: 51

how to print a string containing utf8 code read from file

I have a file which contains UTF-8 encoded text:

b'\xd8\xa3\xd9\x8a \xd8\xb9\xd9\x84\xd9\x85 \xd9\x87\xd8\xb0\xd8\xa7 \xd8\xa7\xd9\x84\xd8\xb0\xd9\x8a \xd9\x84\xd9\x85 \xd9\x8a\xd8\xb3\xd8\xaa\xd8\xb7\xd8\xb9 \xd8\xad\xd8\xaa\xd9\x89 \xd8\xa7\xd9\x84\xd8\xa2\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xb6\xd8\xb9 \xd8\xa3\xd8\xb5\xd9\x88\xd8\xa7\xd8\xaa \xd9\x85\xd9\x86 \xd9\x86\xd8\xad\xd8\xa8 \xd9\x81\xd9\x8a \xd8\xa3\xd9\x82\xd8\xb1\xd8\xa7\xd8\xb5 \xd8\x8c \xd8\xa3\xd9\x88 \xd8\xb2\xd8\xac\xd8\xa7\xd8\xac\xd8\xa9 \xd8\xaf\xd9\x88\xd8\xa7\xd8\xa1 \xd9\x86\xd8\xaa\xd9\x86\xd8\xa7\xd9\x88\xd9\x84\xd9\x87\xd8\xa7 \xd8\xb3\xd8\xb1\xd9\x91\xd9\x8b\xd8\xa7 \xd8\x8c \xd8\xb9\xd9\x86\xd8\xaf\xd9\x85\xd8\xa7 \xd9\x86\xd8\xb5\xd8\xa7\xd8\xa8 \xd8\xa8\xd9\x88\xd8\xb9\xd9\x83\xd8\xa9 \xd8\xb9\xd8\xa7\xd8\xb7\xd9\x81\xd9\x8a\xd8\xa9 \xd8\xa8\xd8\xaf\xd9\x88\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xaf\xd8\xb1\xd9\x8a \xd8\xb5\xd8\xa7\xd8\xad\xd8\xa8\xd9\x87\xd8\xa7 \xd9\x83\xd9\x85 \xd9\x86\xd8\xad\xd9\x86 \xd9\x86\xd8\xad\xd8\xaa\xd8\xa7\xd8\xac\xd9\x87 - \xd8\xa3\xd8\xad\xd9\x84\xd8\xa7\xd9\x85 \xd9\x85\xd8\xb3\xd8\xaa\xd8\xba\xd8\xa7\xd9\x86\xd9\x85\xd9\x8a, \xd8\xb9\xd8\xa7\xd8\xa8\xd8\xb1 \xd8\xb3\xd8\xb1\xd9\x8a\xd8\xb1'

I've tried to print it correctly once decoded but I did not succeed when:

  1. reading from file as text option 'r', decode by bytes(text,'utf8').decode('utf8')

  2. reading from file as binary option 'rb', decode by binary.decode('utf8')

I tried to convert the content in many ways (split text in list, cut out the b' ... ', ...) but didn't succeed to print it clearly!

What am I missing - is the file correctly 'encoded'?

Here is my code in Python 3.7.3

with open('/home/pi/Desktop/unicode_a_decoder.txt', 'r') as f:
    text = f.read()
print(type(text),text)
#seq = text.decode
#seq = bytes(text,"utf8")
#print('seq',seq)
#seq = text
seq = text.split(" ")
#print(seq, seq[0],bytes(seq[0]))
print('seq',seq)
s0 = seq[0]
print(s0,type(s0))
s02byte = bytes(s0, 'utf8')
print(s02byte, type(s02byte))
#print(seq.decode("utf8"))

Upvotes: 0

Views: 1410

Answers (1)

The Pilot Dude
The Pilot Dude

Reputation: 2227

For me, it worked when I simply used .decode()

This is what I did:

text = b'\xd8\xa3\xd9\x8a \xd8\xb9\xd9\x84\xd9\x85 \xd9\x87\xd8\xb0\xd8\xa7 \xd8\xa7\xd9\x84\xd8\xb0\xd9\x8a \xd9\x84\xd9\x85 \xd9\x8a\xd8\xb3\xd8\xaa\xd8\xb7\xd8\xb9 \xd8\xad\xd8\xaa\xd9\x89 \xd8\xa7\xd9\x84\xd8\xa2\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xb6\xd8\xb9 \xd8\xa3\xd8\xb5\xd9\x88\xd8\xa7\xd8\xaa \xd9\x85\xd9\x86 \xd9\x86\xd8\xad\xd8\xa8 \xd9\x81\xd9\x8a \xd8\xa3\xd9\x82\xd8\xb1\xd8\xa7\xd8\xb5 \xd8\x8c \xd8\xa3\xd9\x88 \xd8\xb2\xd8\xac\xd8\xa7\xd8\xac\xd8\xa9 \xd8\xaf\xd9\x88\xd8\xa7\xd8\xa1 \xd9\x86\xd8\xaa\xd9\x86\xd8\xa7\xd9\x88\xd9\x84\xd9\x87\xd8\xa7 \xd8\xb3\xd8\xb1\xd9\x91\xd9\x8b\xd8\xa7 \xd8\x8c \xd8\xb9\xd9\x86\xd8\xaf\xd9\x85\xd8\xa7 \xd9\x86\xd8\xb5\xd8\xa7\xd8\xa8 \xd8\xa8\xd9\x88\xd8\xb9\xd9\x83\xd8\xa9 \xd8\xb9\xd8\xa7\xd8\xb7\xd9\x81\xd9\x8a\xd8\xa9 \xd8\xa8\xd8\xaf\xd9\x88\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xaf\xd8\xb1\xd9\x8a \xd8\xb5\xd8\xa7\xd8\xad\xd8\xa8\xd9\x87\xd8\xa7 \xd9\x83\xd9\x85 \xd9\x86\xd8\xad\xd9\x86 \xd9\x86\xd8\xad\xd8\xaa\xd8\xa7\xd8\xac\xd9\x87 - \xd8\xa3\xd8\xad\xd9\x84\xd8\xa7\xd9\x85 \xd9\x85\xd8\xb3\xd8\xaa\xd8\xba\xd8\xa7\xd9\x86\xd9\x85\xd9\x8a, \xd8\xb9\xd8\xa7\xd8\xa8\xd8\xb1 \xd8\xb3\xd8\xb1\xd9\x8a\xd8\xb1'
print(text.decode())

Upvotes: 2

Related Questions