How to read and write utf_8 in python?

Question

I have some non-ascii data in my python script . python can handle them correctly but when I want to save them it gives an error. So I encode them withstr.encode()and then write them to a file. for reading the file and decoding the data I hadn't problem in python 2.7 using str.decode() - data from reading files are string - but in python 3.6 there isn't any str.decode() function and I got problem.

I couldn't find answer anywhere even in whole python official documentation.

Example code: ignore cases please im writing with phone at thistime

string="hello=سلام -in persian"
file=open("file.txt",'w+', encoding='utf-8')
file.write(string.encode())
# using file.write(string) raises an error
print(file.read())# if the whole string be in Persian prints sth like b'\xff\xa3....'
file.read().decode()# raises an error contains: str object doesn'have attribute decode
# here was my problem in updating from 2.7 to 3.6

file.close()

`

Taku · Accepted Answer

For python 3. You should write the str to file as bytes by using str.encode(), then open the file as write binary mode open('filename.txt', 'wb'). And when reading, read the file as read binary mode. open('filename.txt', 'rb') and use bytes.decode() to convert it back to str.

You can use this as an reference:

utfchar = '¶'
with open('filename.txt', 'wb') as fp:
    fp.write(utfchar.encode())

# and later:

with open('filename.txt', 'rb') as fp:
    utfchar = fp.read().decode()

assert utfchar == '¶'

How to read and write utf_8 in python?

Answers (2)

Related Questions