Spurious
Spurious

Reputation: 2005

Python pandas producing error when trying to access 'DATE' column on large data set

I have a file with 3'502'379 rows and 3 columns. The following script is supposed to be executed but raises and error in the date handling line:

import matplotlib.pyplot as plt
import numpy as np
import csv
import pandas

path = 'data_prices.csv'
data = pandas.read_csv(path, sep=';')
data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')

This is the error that occurs:

Traceback (most recent call last):
  File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1945, in get_loc
    return self._engine.get_loc(key)
  File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
  File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
  File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
  File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\data\script.py", line 15, in <module>
    data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')
  File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 1997, in __getitem__
    return self._getitem_column(key)
  File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 2004, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Program Files\Python35\lib\site-packages\pandas\core\generic.py", line 1350, in _get_item_cache
    values = self._data.get(item)
  File "C:\Program Files\Python35\lib\site-packages\pandas\core\internals.py", line 3290, in get
    loc = self.items.get_loc(item)
  File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1947, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)
  File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)
  File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)
  File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)
KeyError: 'DATE'

Upvotes: 1

Views: 1607

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

the '\ufeffDATE' in the first column name shows that your CSV file has a UTF-16 Byte Order Mark (BOM) signature so it must be read accordingly.

so try this when reading your CSV:

df = pd.read_csv(path, sep=';', encoding='utf-8-sig')

or as @EdChum suggested:

df = pd.read_csv(path, sep=';', encoding='utf-16')

both variants should work properly

PS this answer shows how to deal with BOMs

Upvotes: 4

Related Questions