Reputation: 9348
Below codes were used in Python 2 to combine all txt files in a folder. It worked fine.
import os
base_folder = "C:\\FDD\\"
all_files = []
for each in os.listdir(base_folder):
if each.endswith('.txt'):
kk = os.path.join(base_folder, each)
all_files.append(kk)
with open(base_folder + "Combined.txt", 'w') as outfile:
for fname in all_files:
with open(fname) as infile:
for line in infile:
outfile.write(line)
When in Python 3, it gives an error:
Traceback (most recent call last):
File "C:\Scripts\thescript.py", line 26, in <module>
for line in infile:
File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'CP_UTF8' codec can't decode byte 0xe4 in position 53: No mapping for the Unicode character exists in the target code page.
I made this change:
with open(fname) as infile:
to
with open(fname, 'r', encoding = 'latin-1') as infile:
It gives me “MemoryError”.
How can I correct this error in Python 3? Thank you.
Upvotes: 1
Views: 340
Reputation: 1918
As @transilvlad suggested here, use the open method from the codecs module to read in the file:
import codecs
with codecs.open(fname, 'r', encoding = 'utf-8',
errors='ignore') as infile:
This will strip out (ignore) the characters in the error returning the string without them.
Upvotes: 2