Mark K
Mark K

Reputation: 9348

Combining multiple txt files (Python 3, UnicodeDecodeError)

Below codes were used in Python 2 to combine all txt files in a folder. It worked fine.

import os

base_folder = "C:\\FDD\\"

all_files = []

for each in os.listdir(base_folder):
    if each.endswith('.txt'):
        kk = os.path.join(base_folder, each)
        all_files.append(kk)

with open(base_folder + "Combined.txt", 'w') as outfile:
    for fname in all_files:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

When in Python 3, it gives an error:

Traceback (most recent call last):
  File "C:\Scripts\thescript.py", line 26, in <module>
    for line in infile:
  File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'CP_UTF8' codec can't decode byte 0xe4 in position 53: No mapping for the Unicode character exists in the target code page.

I made this change:

with open(fname) as infile:

to

with open(fname, 'r', encoding = 'latin-1') as infile:

It gives me “MemoryError”.

How can I correct this error in Python 3? Thank you.

Upvotes: 1

Views: 340

Answers (1)

henrywongkk
henrywongkk

Reputation: 1918

As @transilvlad suggested here, use the open method from the codecs module to read in the file:

import codecs
with codecs.open(fname, 'r', encoding = 'utf-8', 
                 errors='ignore') as infile:

This will strip out (ignore) the characters in the error returning the string without them.

Upvotes: 2

Related Questions