Jeff Winchell
Jeff Winchell

Reputation: 113

Convert unicode text to single byte ascii in Python

I have an input file whose data I need to process. The file is in UTF-16 even though every single character in it is just a standard ascii character.

I can NOT change the input file so that it doesn't use useless double byte characters to represent 100% English language single character data. I need to convert this in python, on Windows. (Please, no non-python solutions, thank you).

I want my python program to act on these strings and output a file which is NOT double-byte. I just want standard ascii strings (one byte per character)

I've googled a lot, see all sorts of related questions, but not mine. I'm frustrated with not being able to solve this seemingly very simple question and need.

EDIT: Here is the program I got to work. It is absurd. There must be an easier way. The chr(10) references in the code is because the input has lines and I couldn't find a nonabsurd way to do simple readline/writeline calls.

with open('Unicode.txt','r') as input:
    with open('ASCII.txt','w') as output:
        for line in input.readlines():
            codelist=[code for code in line.encode('ascii','ignore') if code not in (0,10)]
            if codelist:
                output.write(''.join([chr(code) for code in codelist]+[chr(10)]))

Question solved after reading a hint from @Mark Ransom.

Upvotes: 0

Views: 1173

Answers (1)

Jeff Winchell
Jeff Winchell

Reputation: 113

with open('unicode.txt','r',encoding='UTF-16') as input:
    with open('ascii.txt','w',encoding='ascii') as output:
        output.write(input.read())

Upvotes: 1

Related Questions