Reputation: 2379

Python3 String has no decode for windows-1256

I have an Arabic string in windows-1256, that I need to convert into ascii, so that it can be sent into html2text. However upon execution an error returns stating str object has no attribute 'decode'

filename=codecs.open(keyworddir + "\\" + item, "r", encoding = "windows-1256")
outputfile=filename.readlines()
file=open(keyworddir + "\\" + item, "w")
for line in outputfile:
    line=line.decode(encoding='windows-1256')
    line=line.encode('UTF-8')
    file.write(line)
file.close()

Upvotes: 0

Answers (2)

FazeL

Reputation: 986

I had similar problems, It took me 5 days of work trying to solve this problem, finally I used following solution.

before opening the file run this command to commandline(it is of course in linux command line)

iconv -f 'windows-1256' -t 'uft-8' '[your file name]' -o '[output file name]'

so you can run commandline commands automaticly in python code using that python function

import subprocess
def run_cmd(cmd):
    process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
    process.wait()

Upvotes: 0

Karol S

Reputation: 9411

In Python 3, str is already a decoded Unicode string, so you cannot decode line again.

What you have missed, is decoding happening implicitly while reading the file. codecs.open with "r" mode allows for reading the file as a text file with given encoding and automatically decodes all text.

So. you can either:

open the file in binary mode: filename=open(keyworddir + "\\" + item, "rb"); the lines will now be bytes and they will be decodeable
or, better, simply remove superfluous decoding: ~~line=line.decode(encoding='windows-1256')~~

Note:
you should consider opening the output file with codecs.open(keyworddir + "\\" + item, "w", encoding = "utf-8"), therefore making it unnecessary to manually encode the line

Upvotes: 1

Python3 String has no decode for windows-1256

Answers (2)

Related Questions