Reputation: 1
I need to convert multiple CSV files (with different encodings) into UTF-8.
Here is my code:
#find encoding and if not in UTF-8 convert it
import os
import sys
import glob
import chardet
import codecs
myFiles = glob.glob('/mypath/*.csv')
csv_encoding = []
for file in myFiles:
with open(file, 'rb') as opened_file:
bytes_file=opened_file.read()
result=chardet.detect(bytes_file)
my_encoding=result['encoding']
csv_encoding.append(my_encoding)
print(csv_encoding)
for file in myFiles:
if csv_encoding in ['utf-8', 'ascii']:
print(file + ' in utf-8 encoding')
else:
with codecs.open(file, 'r') as file_for_conversion:
read_file_for_conversion = file_for_conversion.read()
with codecs.open(file, 'w', 'utf-8') as converted_file:
converted_file.write(read_file_for_conversion)
print(file +' converted to utf-8')
When I try to run this code I get the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf3 in position 5057: invalid continuation byte
Can someone help me? Thanks!!!
Upvotes: 0
Views: 1428
Reputation: 5817
You need to zip
the lists myFiles
and csv_encoding
to get their values aligned:
for file, encoding in zip(myFiles, csv_encoding):
...
And you need to specify that value in the open()
call:
...
with codecs.open(file, 'r', encoding=encoding) as file_for_conversion:
Note: in Python 3 there's no need to use the codecs
module for opening files.
Just use the built-in open
function and specify the encoding with the encoding
parameter.
Upvotes: 1