Reputation: 2005
I have a file with 3'502'379 rows and 3 columns. The following script is supposed to be executed but raises and error in the date handling line:
import matplotlib.pyplot as plt
import numpy as np
import csv
import pandas
path = 'data_prices.csv'
data = pandas.read_csv(path, sep=';')
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
This is the error that occurs:
Traceback (most recent call last):
File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1945, in get_loc
return self._engine.get_loc(key)
File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\data\script.py", line 15, in <module>
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 1997, in __getitem__
return self._getitem_column(key)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 2004, in _getitem_column
return self._get_item_cache(key)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\generic.py", line 1350, in _get_item_cache
values = self._data.get(item)
File "C:\Program Files\Python35\lib\site-packages\pandas\core\internals.py", line 3290, in get
loc = self.items.get_loc(item)
File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1947, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'
Upvotes: 1
Views: 1607
Reputation: 210882
the '\ufeffDATE'
in the first column name shows that your CSV file has a UTF-16 Byte Order Mark (BOM) signature so it must be read accordingly.
so try this when reading your CSV:
df = pd.read_csv(path, sep=';', encoding='utf-8-sig')
or as @EdChum suggested:
df = pd.read_csv(path, sep=';', encoding='utf-16')
both variants should work properly
PS this answer shows how to deal with BOMs
Upvotes: 4