user5597655
user5597655

Reputation:

Encoding error when combining text files

I'm trying to run this code:

import glob
import io

read_files = filter(lambda f: f!='final.txt' and f!='result.txt', glob.glob('*.txt'))


with io.open("REGEXES.rx.txt", "w", encoding='UTF-32') as outfile:
    for f in read_files:
        with open(f, "r") as infile:
            outfile.write(infile.read())
            outfile.write('|')

To combine some text files and I get this error:

Traceback (most recent call last):
  File "/Users/kosay.jabre/Desktop/Password Assessor/RegexesNEW/CombineFilesCopy.py", line 10, in <module>
    outfile.write(infile.read())
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 2189: ordinal not in range(128)

I've tried UTF-8, UTF-16, UTF-32 and latin-1 encodings. Any ideas?

Upvotes: 0

Views: 1172

Answers (1)

Alastair McCormack
Alastair McCormack

Reputation: 27704

You're getting the error from infile.read(). The file was opened in text mode without an encoding specified. Python will try to guess your default file encoding but may default to ascii. Any byte larger than \x7f / 127 is not ASCI, so will throw an error.

You need to know the encoding of your files before you proceed, otherwise you will get errors if Python tries to read one encoding and gets another, or you will simply get mojibake.

Assuming that infile will be utf-8 encoded, change:

with open(f, "r") as infile:

to:

with open(f, "r", encoding="utf-8") as infile:

You may also want to change outfile's encoding to UTF-8 to avoid potential storage wastage. Because the input is being decoded to plain Unicode, infile and outfile's encoding don't need to match.

Upvotes: 1

Related Questions