N08
N08

Reputation: 1315

Reading csv-file in Python containing undefined characters

I am reading a CSV-file (ANSI) on my Windows-machine in Python using this code:

import csv
with open('ttest.dat') as csvDataFile:
    csvReader = csv.reader(csvDataFile, delimiter="\t")
    for i in csvReader:
        print(i)

However, I get the error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4: character maps to <undefined>

Upon inspecting the file in Notepad++ (after converting it to UTF-8 encoding in Notepad) I see that the following appears:

enter image description here

It seems that these characters adjacent to hello are causing the issue. When I remove them manually the file can be read.

Is there a way to load the file in Python while explicitly telling it to disregard these odd characters? Or, alternatively, is there a method to strip the text from these characters automatically? My file is rather large, so it isn't realistic that I manually look through each line.

Note: In R I can read the file without any issues using read.csv

Upvotes: 1

Views: 2409

Answers (1)

ltd9938
ltd9938

Reputation: 1454

with open('ttest.dat', encoding="utf8") as csvDataFile:

This will open the file with UTF-8 encoding.

Upvotes: 2

Related Questions