Trying to change entries in dataframe raising key error

Question

I have created a dataframe called autos from this dataset. I have 3 columns in this dataframe which haves dates as entries. I want to remove remove the hours minutes and seconds part of the date. Example:

data = [["2016-03-24 11:52:17"], ["2016-03-24 10:58:45"], ["2016-03-14 12:52:21"]] 
auto = pd.DataFrame(data, columns = ['date_crawled'])

Output:

          date_crawled
0  2016-03-24 11:52:17
1  2016-03-24 10:58:45
2  2016-03-14 12:52:21

I thought I could do this by creating the following function, which would take in a date column and format it.

import datetime as dt
def datetimeconv(date_column):
    for i in range(0,371528,1):
        for elements in auto[i,date_column]:
            elements=dt.datetime.strptime(elements,"%Y-%m-%d %H:%M:%S")
            elements=elements.strftime("%d-%m-%Y")
            auto.loc[i,date_column]=(elements)

When I tried to test it out on the date_crawled column:

datetimeconv("date_crawled")

I got the following error:

KeyError                                  Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3077             try:
-> 3078                 return self._engine.get_loc(key)
   3079             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: (0, 'date_crawled')

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
 in 
      6             elements=elements.strftime("%d-%m-%Y")
      7             auto.loc[i,date_column]=(elements)
----> 8 datetimeconv("date_crawled")
      9 

 in datetimeconv(date_column)
      2 def datetimeconv(date_column):
      3     for i in range(0,371528,1):
----> 4         for elements in auto[i,date_column]:
      5             elements=dt.datetime.strptime(elements,"%Y-%m-%d %H:%M:%S")
      6             elements=elements.strftime("%d-%m-%Y")

~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2686             return self._getitem_multilevel(key)
   2687         else:
-> 2688             return self._getitem_column(key)
   2689 
   2690     def _getitem_column(self, key):

~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2693         # get column
   2694         if self.columns.is_unique:
-> 2695             return self._get_item_cache(key)
   2696 
   2697         # duplicate columns & possible reduce dimensionality

~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   2487         res = cache.get(item)
   2488         if res is None:
-> 2489             values = self._data.get(item)
   2490             res = self._box_item_values(item, values)
   2491             cache[item] = res

~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   4113 
   4114             if not isna(item):
-> 4115                 loc = self.items.get_loc(item)
   4116             else:
   4117                 indexer = np.arange(len(self.items))[isna(self.items)]

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3078                 return self._engine.get_loc(key)
   3079             except KeyError:
-> 3080                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   3081 
   3082         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: (0, 'date_crawled')

Why am I getting a key error?

Stef · Accepted Answer

Reason for KeyError:

You must use df.loc[i,'date_crawled'] instead of df[i,'date_crawled']. The latter tries to select a column (series) with a hierarchical index (multiindex) by the tuple (i,'date_crawled'). Such a column doesn't exist in your dataframe, hence the KeyError.

The normal pandas way to do it is:

auto['date_crawled'] = auto['date_crawled'].apply(lambda x: pd.to_datetime(x).strftime("%d-%m-%Y"))

Alternatively, as Nils Werner remarked in his comment, also:

auto['date_crawled'] = pd.to_datetime(auto['date_crawled']).dt.strftime("%d-%m-%Y")

To anser your question why your code doesn't work (besides the KeyError): in for elements in auto.loc[i,date_column] you iterate over the individual characters in each entry. The following would be a working version:

def datetimeconv(date_column):
    for i in range(0,len(auto)):
            elements=auto.loc[i,date_column]
            elements=dt.datetime.strptime(elements,"%Y-%m-%d %H:%M:%S")
            elements=elements.strftime("%d-%m-%Y")
            auto.loc[i,date_column]=(elements)

However, never iterate explicitely over dataframe rows, use pandas methods whenever possible. This code is just to illustrate where your error was.

Trying to change entries in dataframe raising key error

Answers (2)

Related Questions