Reputation: 43
I've read every post I can find, but my situation seems unique. I'm totally new to Python so this could be basic. I'm getting the following error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 70: character maps to undefined
When I run the code:
import csv
input_file = 'input.csv'
output_file = 'output.csv'
cols_to_remove = [4, 6, 8, 9, 10, 11,13, 14, 19, 20, 21, 22, 23, 24]
cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0 # Current amount of rows processed
with open(input_file, "r") as source:
reader = csv.reader(source)
with open(output_file, "w", newline='') as result:
writer = csv.writer(result)
for row in reader:
row_count += 1
print('\r{0}'.format(row_count), end='')
for col_index in cols_to_remove:
del row[col_index]
writer.writerow(row)
What am I doing wrong?
Upvotes: 4
Views: 10154
Reputation: 149195
In Python 3, the csv module processes the file as unicode strings, and because of that has to first decode the input file. You can use the exact encoding if you know it, or just use Latin1 because it maps every byte to the unicode character with same code point, so that decoding+encoding keep the byte values unchanged. Your code could become:
...
with open(input_file, "r", encoding='Latin1') as source:
reader = csv.reader(source)
with open(output_file, "w", newline='', encoding='Latin1') as result:
...
Upvotes: 6
Reputation: 380
Add encoding="utf8"
while opening file. Try below instead:
with open(input_file, "r", encoding="utf8") as source:
reader = csv.reader(source)
with open(output_file, "w", newline='', encoding="utf8") as result:
Upvotes: 4
Reputation: 1420
pandas
input_file = pandas.read_csv('input.csv')
output_file = pandas.read_csv('output.csv')
Upvotes: 0