Is datetime data in Pandas supposed to be in the index?

Question

By supposed to, what I mean is Is that the way Pandas is designed?, Are all Pandas time series functions built upon that assumption?

A few weeks ago I was experimenting with pandas.rolling_mean which seemed to want the datetime to be in the index.

Given a dataframe like this:

df = pd.DataFrame({'date' : ['23/10/2017', '24/10/2017', '25/10/2017','26/10/2017','27/10/2017'], 'dax-close' : [13003.14, 13013.19, 12953.41,13133.28,13217.54]})
df['date'] = pd.to_datetime(df['date'])
df

...is it important to always do this:

df.set_index('date', inplace=True)
df

...as one of the first steps of an analysis?

Andy Hayden · Accepted Answer

The short answer is usually timeseries data has date as a DatetimeIndex. and many pandas functions do make use of that e.g. resample is a big one.

That said, you don't need to have Dates as an index, for example you may even have multiple Datetime columns, then you're out of luck calling the vanilla resample... however you can use pd.Grouper to define the "resample" on a column (or as part of a larger/multi-column groupby)

In [11]: df.groupby(pd.Grouper(key="date", freq="2D")).sum()
Out[11]:
            dax-close
date
2017-10-23   26016.33
2017-10-25   26086.69
2017-10-27   13217.54

In [12]: df.set_index("date").resample("2D").sum()
Out[12]:
            dax-close
date
2017-10-23   26016.33
2017-10-25   26086.69
2017-10-27   13217.54

The former gives more flexibility in that you can groupby multiple columns:

In [21]: df["X"] = list("AABAC")

In [22]: df.groupby(["X", pd.Grouper(key="date", freq="2D")]).sum()
Out[22]:
              dax-close
X date
A 2017-10-23   26016.33
  2017-10-25   13133.28
B 2017-10-25   12953.41
C 2017-10-27   13217.54

Is datetime data in Pandas supposed to be in the index?

Answers (1)

Related Questions