Momboosa
Momboosa

Reputation: 43

csv read raises "UnicodeDecodeError: 'charmap' codec can't decode..."

I've read every post I can find, but my situation seems unique. I'm totally new to Python so this could be basic. I'm getting the following error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 70: character maps to undefined

When I run the code:

import csv

input_file = 'input.csv'
output_file = 'output.csv'
cols_to_remove = [4, 6, 8, 9, 10, 11,13, 14, 19, 20, 21, 22, 23, 24]

cols_to_remove = sorted(cols_to_remove, reverse=True)
row_count = 0 # Current amount of rows processed

with open(input_file, "r") as source:
    reader = csv.reader(source)
    with open(output_file, "w", newline='') as result:
        writer = csv.writer(result)
        for row in reader:
            row_count += 1
            print('\r{0}'.format(row_count), end='')
            for col_index in cols_to_remove:
                del row[col_index]
            writer.writerow(row)

What am I doing wrong?

Upvotes: 4

Views: 10154

Answers (3)

Serge Ballesta
Serge Ballesta

Reputation: 149195

In Python 3, the csv module processes the file as unicode strings, and because of that has to first decode the input file. You can use the exact encoding if you know it, or just use Latin1 because it maps every byte to the unicode character with same code point, so that decoding+encoding keep the byte values unchanged. Your code could become:

...
with open(input_file, "r", encoding='Latin1') as source:
    reader = csv.reader(source)
    with open(output_file, "w", newline='', encoding='Latin1') as result:
        ...

Upvotes: 6

shubhambharti201
shubhambharti201

Reputation: 380

Add encoding="utf8" while opening file. Try below instead:

with open(input_file, "r", encoding="utf8") as source:
    reader = csv.reader(source)
    with open(output_file, "w", newline='', encoding="utf8") as result:

Upvotes: 4

Hamza Zubair
Hamza Zubair

Reputation: 1420

  1. Try pandas

input_file = pandas.read_csv('input.csv') output_file = pandas.read_csv('output.csv')

  1. Try saving the file again as CSV UTF-8

Upvotes: 0

Related Questions