CSV reader picks up garbage in the first few characters

Question

I am trying to read the first line of a CSV file and assign it to header. The CSV file looks like this:

TIME,DAY,MONTH,YEAR
"3:21","23","FEB","2018"
"3:23","23","FEB","2018"
...

Here is the code:

import csv

with open("20180223.csv") as csvfile:
    rdr = csv.reader(csvfile)
    header = next(rdr)
    print(header)

I expect the output to look like:

['TIME', 'DAY', 'MONTH', 'YEAR']

However the output looks like this:

['ï»¿TIME', 'DAY', 'MONTH', 'YEAR']

What did I miss?

sjw · Accepted Answer

That first character is the Byte order mark character.

Try this:

with open("20180223.csv", encoding="utf-8-sig") as csvfile:

This advice is somewhat hidden away in the documentation, but it is there:

In some areas, it is also convention to use a “BOM” at the start of UTF-8 encoded files; the name is misleading since UTF-8 is not byte-order dependent. The mark simply announces that the file is encoded in UTF-8. Use the ‘utf-8-sig’ codec to automatically skip the mark if present for reading such files.

CSV reader picks up garbage in the first few characters

Answers (2)

Related Questions