Reputation: 1315
I am reading a CSV-file (ANSI) on my Windows-machine in Python using this code:
import csv
with open('ttest.dat') as csvDataFile:
csvReader = csv.reader(csvDataFile, delimiter="\t")
for i in csvReader:
print(i)
However, I get the error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4: character maps to <undefined>
Upon inspecting the file in Notepad++ (after converting it to UTF-8 encoding in Notepad) I see that the following appears:
It seems that these characters adjacent to hello
are causing the issue. When I remove them manually the file can be read.
Is there a way to load the file in Python while explicitly telling it to disregard these odd characters? Or, alternatively, is there a method to strip the text from these characters automatically? My file is rather large, so it isn't realistic that I manually look through each line.
Note: In R I can read the file without any issues using read.csv
Upvotes: 1
Views: 2409
Reputation: 1454
with open('ttest.dat', encoding="utf8") as csvDataFile:
This will open the file with UTF-8 encoding.
Upvotes: 2