Reputation: 113
I have an input file whose data I need to process. The file is in UTF-16 even though every single character in it is just a standard ascii character.
I can NOT change the input file so that it doesn't use useless double byte characters to represent 100% English language single character data. I need to convert this in python, on Windows. (Please, no non-python solutions, thank you).
I want my python program to act on these strings and output a file which is NOT double-byte. I just want standard ascii strings (one byte per character)
I've googled a lot, see all sorts of related questions, but not mine. I'm frustrated with not being able to solve this seemingly very simple question and need.
EDIT: Here is the program I got to work. It is absurd. There must be an easier way. The chr(10) references in the code is because the input has lines and I couldn't find a nonabsurd way to do simple readline/writeline calls.
with open('Unicode.txt','r') as input:
with open('ASCII.txt','w') as output:
for line in input.readlines():
codelist=[code for code in line.encode('ascii','ignore') if code not in (0,10)]
if codelist:
output.write(''.join([chr(code) for code in codelist]+[chr(10)]))
Question solved after reading a hint from @Mark Ransom.
Upvotes: 0
Views: 1173
Reputation: 113
with open('unicode.txt','r',encoding='UTF-16') as input:
with open('ascii.txt','w',encoding='ascii') as output:
output.write(input.read())
Upvotes: 1