Reputation: 2379
I have an Arabic string in windows-1256, that I need to convert into ascii, so that it can be sent into html2text. However upon execution an error returns stating str object has no attribute 'decode'
filename=codecs.open(keyworddir + "\\" + item, "r", encoding = "windows-1256")
outputfile=filename.readlines()
file=open(keyworddir + "\\" + item, "w")
for line in outputfile:
line=line.decode(encoding='windows-1256')
line=line.encode('UTF-8')
file.write(line)
file.close()
Upvotes: 0
Views: 714
Reputation: 986
I had similar problems, It took me 5 days of work trying to solve this problem, finally I used following solution.
before opening the file run this command to commandline(it is of course in linux command line)
iconv -f 'windows-1256' -t 'uft-8' '[your file name]' -o '[output file name]'
so you can run commandline commands automaticly in python code using that python function
import subprocess
def run_cmd(cmd):
process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
process.wait()
Upvotes: 0
Reputation: 9411
In Python 3, str
is already a decoded Unicode string, so you cannot decode line
again.
What you have missed, is decoding happening implicitly while reading the file. codecs.open
with "r"
mode allows for reading the file as a text file with given encoding and automatically decodes all text.
So. you can either:
open the file in binary mode: filename=open(keyworddir + "\\" + item, "rb")
; the lines will now be bytes
and they will be decodeable
or, better, simply remove superfluous decoding: line=line.decode(encoding='windows-1256')
Note:
you should consider opening the output file with codecs.open(keyworddir + "\\" + item, "w", encoding = "utf-8")
, therefore making it unnecessary to manually encode the line
Upvotes: 1