Reputation: 7431
By supposed to, what I mean is Is that the way Pandas is designed?, Are all Pandas time series functions built upon that assumption?
A few weeks ago I was experimenting with pandas.rolling_mean which seemed to want the datetime to be in the index.
Given a dataframe like this:
df = pd.DataFrame({'date' : ['23/10/2017', '24/10/2017', '25/10/2017','26/10/2017','27/10/2017'], 'dax-close' : [13003.14, 13013.19, 12953.41,13133.28,13217.54]})
df['date'] = pd.to_datetime(df['date'])
df
...is it important to always do this:
df.set_index('date', inplace=True)
df
...as one of the first steps of an analysis?
Upvotes: 1
Views: 326
Reputation: 375855
The short answer is usually timeseries data has date as a DatetimeIndex. and many pandas functions do make use of that e.g. resample
is a big one.
That said, you don't need to have Dates as an index, for example you may even have multiple Datetime columns, then you're out of luck calling the vanilla resample... however you can use pd.Grouper
to define the "resample" on a column (or as part of a larger/multi-column groupby)
In [11]: df.groupby(pd.Grouper(key="date", freq="2D")).sum()
Out[11]:
dax-close
date
2017-10-23 26016.33
2017-10-25 26086.69
2017-10-27 13217.54
In [12]: df.set_index("date").resample("2D").sum()
Out[12]:
dax-close
date
2017-10-23 26016.33
2017-10-25 26086.69
2017-10-27 13217.54
The former gives more flexibility in that you can groupby multiple columns:
In [21]: df["X"] = list("AABAC")
In [22]: df.groupby(["X", pd.Grouper(key="date", freq="2D")]).sum()
Out[22]:
dax-close
X date
A 2017-10-23 26016.33
2017-10-25 13133.28
B 2017-10-25 12953.41
C 2017-10-27 13217.54
Upvotes: 1