cardamom
cardamom

Reputation: 7431

Is datetime data in Pandas supposed to be in the index?

By supposed to, what I mean is Is that the way Pandas is designed?, Are all Pandas time series functions built upon that assumption?

A few weeks ago I was experimenting with pandas.rolling_mean which seemed to want the datetime to be in the index.

Given a dataframe like this:

df = pd.DataFrame({'date' : ['23/10/2017', '24/10/2017', '25/10/2017','26/10/2017','27/10/2017'], 'dax-close' : [13003.14, 13013.19, 12953.41,13133.28,13217.54]})
df['date'] = pd.to_datetime(df['date'])
df

datetime as column

...is it important to always do this:

df.set_index('date', inplace=True)
df

datetime as index

...as one of the first steps of an analysis?

Upvotes: 1

Views: 326

Answers (1)

Andy Hayden
Andy Hayden

Reputation: 375855

The short answer is usually timeseries data has date as a DatetimeIndex. and many pandas functions do make use of that e.g. resample is a big one.

That said, you don't need to have Dates as an index, for example you may even have multiple Datetime columns, then you're out of luck calling the vanilla resample... however you can use pd.Grouper to define the "resample" on a column (or as part of a larger/multi-column groupby)

In [11]: df.groupby(pd.Grouper(key="date", freq="2D")).sum()
Out[11]:
            dax-close
date
2017-10-23   26016.33
2017-10-25   26086.69
2017-10-27   13217.54

In [12]: df.set_index("date").resample("2D").sum()
Out[12]:
            dax-close
date
2017-10-23   26016.33
2017-10-25   26086.69
2017-10-27   13217.54

The former gives more flexibility in that you can groupby multiple columns:

In [21]: df["X"] = list("AABAC")

In [22]: df.groupby(["X", pd.Grouper(key="date", freq="2D")]).sum()
Out[22]:
              dax-close
X date
A 2017-10-23   26016.33
  2017-10-25   13133.28
B 2017-10-25   12953.41
C 2017-10-27   13217.54

Upvotes: 1

Related Questions