havingaball
havingaball

Reputation: 378

Trouble using date/datetime to create new series using Pandas

I have some financial data that I'm playing around with on AWS just to learn some new things. I have downloaded this data using the yfinance module. I'm not sure if/how I could include a csv file with the data but here is a crop of the df.head() to hopefully give you an idea of what it looks like. It's some daily prices sorted by Date in YYYY-MM-DD format.

Ultimately, I would like to break this dataframe into separate pandas series based on calendar years. Some searching suggests that I should use something like

df['Date'] = pd.to_datetime(df['Date'], format="%Y-%m-%d")

to convert to pd.datetime from which I should be able to covert to a series relatively easily. However, I have tried many variants of this but keep getting a long traceback error:

KeyError                                  Traceback (most recent call last)
~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Date'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-82-c28bc405dae4> in <module>
      3 SP500_df = fill_nan_with_mean(SP500)
      4 
----> 5 df['Date'] = pd.to_datetime(df['Date'], format="%Y-%m-%d")

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Date'

I'm not sure what to do about this error. I am currently thinking that it is possibly cause by either: i. my incompetence, ii. something with Date not being saved how I think it is saved and therefore not converting correctly or iii. something inherent with AWS maybe?

Does anyone have a suggestion for what may be happening here? Failing that, does anyone have a suggestion for any potential workarounds that would avoid using pd.datetime entirely?

Thanks in advance

Upvotes: 2

Views: 310

Answers (1)

havingaball
havingaball

Reputation: 378

MrFuppes is correct. It was enough to do df.index = pd.to_datetime(df.index). I had actually tried resetting the index before calling the column and that had given me the same error but at least this works.

Upvotes: 1

Related Questions