Reputation: 483
I have created a dataframe called autos from this dataset. I have 3 columns in this dataframe which haves dates as entries. I want to remove remove the hours minutes and seconds part of the date. Example:
data = [["2016-03-24 11:52:17"], ["2016-03-24 10:58:45"], ["2016-03-14 12:52:21"]]
auto = pd.DataFrame(data, columns = ['date_crawled'])
Output:
date_crawled
0 2016-03-24 11:52:17
1 2016-03-24 10:58:45
2 2016-03-14 12:52:21
I thought I could do this by creating the following function, which would take in a date column and format it.
import datetime as dt
def datetimeconv(date_column):
for i in range(0,371528,1):
for elements in auto[i,date_column]:
elements=dt.datetime.strptime(elements,"%Y-%m-%d %H:%M:%S")
elements=elements.strftime("%d-%m-%Y")
auto.loc[i,date_column]=(elements)
When I tried to test it out on the date_crawled
column:
datetimeconv("date_crawled")
I got the following error:
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: (0, 'date_crawled')
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-71-2e8c9398d8c4> in <module>
6 elements=elements.strftime("%d-%m-%Y")
7 auto.loc[i,date_column]=(elements)
----> 8 datetimeconv("date_crawled")
9
<ipython-input-71-2e8c9398d8c4> in datetimeconv(date_column)
2 def datetimeconv(date_column):
3 for i in range(0,371528,1):
----> 4 for elements in auto[i,date_column]:
5 elements=dt.datetime.strptime(elements,"%Y-%m-%d %H:%M:%S")
6 elements=elements.strftime("%d-%m-%Y")
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2686 return self._getitem_multilevel(key)
2687 else:
-> 2688 return self._getitem_column(key)
2689
2690 def _getitem_column(self, key):
~\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2693 # get column
2694 if self.columns.is_unique:
-> 2695 return self._get_item_cache(key)
2696
2697 # duplicate columns & possible reduce dimensionality
~\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
2487 res = cache.get(item)
2488 if res is None:
-> 2489 values = self._data.get(item)
2490 res = self._box_item_values(item, values)
2491 cache[item] = res
~\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
4113
4114 if not isna(item):
-> 4115 loc = self.items.get_loc(item)
4116 else:
4117 indexer = np.arange(len(self.items))[isna(self.items)]
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3078 return self._engine.get_loc(key)
3079 except KeyError:
-> 3080 return self._engine.get_loc(self._maybe_cast_indexer(key))
3081
3082 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: (0, 'date_crawled')
Why am I getting a key error?
Upvotes: 0
Views: 724
Reputation: 30609
Reason for KeyError:
You must use df.loc[i,'date_crawled']
instead of df[i,'date_crawled']
. The latter tries to select a column (series) with a hierarchical index (multiindex) by the tuple (i,'date_crawled'). Such a column doesn't exist in your dataframe, hence the KeyError.
The normal pandas way to do it is:
auto['date_crawled'] = auto['date_crawled'].apply(lambda x: pd.to_datetime(x).strftime("%d-%m-%Y"))
Alternatively, as Nils Werner remarked in his comment, also:
auto['date_crawled'] = pd.to_datetime(auto['date_crawled']).dt.strftime("%d-%m-%Y")
for elements in auto.loc[i,date_column]
you iterate over the individual characters in each entry. The following would be a working version:
def datetimeconv(date_column):
for i in range(0,len(auto)):
elements=auto.loc[i,date_column]
elements=dt.datetime.strptime(elements,"%Y-%m-%d %H:%M:%S")
elements=elements.strftime("%d-%m-%Y")
auto.loc[i,date_column]=(elements)
However, never iterate explicitely over dataframe rows, use pandas methods whenever possible. This code is just to illustrate where your error was.
Upvotes: 1
Reputation: 36775
You can transform these columns to a DateTime column and drop the time using pd.to_datetime(column).dt.date
:
df[['date_crawled', 'ad_created', 'last_seen']] = df[['date_crawled', 'ad_created', 'last_seen']].apply(lambda x: pd.to_datetime(x).dt.date)
df
# date_crawled ad_created last_seen
# 0 2016-03-24 2016-03-24 2016-04-07
# 1 2016-03-24 2016-03-24 2016-04-07
# 2 2016-03-14 2016-03-14 2016-04-05
# 3 2016-03-17 2016-03-17 2016-03-17
# 4 2016-03-31 2016-03-31 2016-04-06
Upvotes: 0