Reputation: 487
I am trying to read the first line of a CSV file and assign it to header
. The CSV file looks like this:
TIME,DAY,MONTH,YEAR
"3:21","23","FEB","2018"
"3:23","23","FEB","2018"
...
Here is the code:
import csv
with open("20180223.csv") as csvfile:
rdr = csv.reader(csvfile)
header = next(rdr)
print(header)
I expect the output to look like:
['TIME', 'DAY', 'MONTH', 'YEAR']
However the output looks like this:
['TIME', 'DAY', 'MONTH', 'YEAR']
What did I miss?
Upvotes: 18
Views: 8051
Reputation: 692
In PHP you can do this to get rid of this Byte Order Mark, since you know for sure it exists:
$ss = substr(file_get_contents('/path/to/file.csv'), 3);
Upvotes: 0
Reputation: 6543
That first character is the Byte order mark character.
Try this:
with open("20180223.csv", encoding="utf-8-sig") as csvfile:
This advice is somewhat hidden away in the documentation, but it is there:
In some areas, it is also convention to use a “BOM” at the start of UTF-8 encoded files; the name is misleading since UTF-8 is not byte-order dependent. The mark simply announces that the file is encoded in UTF-8. Use the ‘utf-8-sig’ codec to automatically skip the mark if present for reading such files.
Upvotes: 38