Iterate over chunks of dataframe by time period

Question

I have a pandas dataframe indexed by time

>>> df
                   A         B         C         D
2000-01-03  1.991135  0.045306 -0.657898  0.311375
2000-01-04  0.690848  1.862244  0.709432 -2.080355
2000-01-05  0.602610 -0.205035  1.248848  0.192274
2000-01-06 -0.646513 -0.170194  0.365317  0.121467
2000-01-07  0.461580  0.259200  0.734326  1.885612
2000-01-10 -1.277500  0.840206 -0.570010  0.155367
...

I want to efficiently partition this dataframe with a sorted index by a datetime period. I want an iterator of smaller dataframes as a result

seq = partition_all(df, freq='1M')

>>> next(seq)
               A         B         C         D
2000-01-03  1.991135  0.045306 -0.657898  0.311375
2000-01-04  0.690848  1.862244  0.709432 -2.080355
2000-01-05  0.602610 -0.205035  1.248848  0.192274
...
>>> next(seq)
               A         B         C         D
2000-02-01 -0.108412  0.188484 -0.568542  0.335969
2000-02-02  0.855690 -0.283225  1.471867  0.309235
2000-02-03 -0.266090  0.684080  0.187856  1.734062
...

Andy Hayden · Accepted Answer

You can use a TimeGrouper to groupby month:

In [11]: df
Out[11]:
                   A         B         C         D
2000-01-03  1.991135  0.045306 -0.657898  0.311375
2000-01-04  0.690848  1.862244  0.709432 -2.080355
2000-01-05  0.602610 -0.205035  1.248848  0.192274
2000-02-01 -0.108412  0.188484 -0.568542  0.335969
2000-02-02  0.855690 -0.283225  1.471867  0.309235
2000-02-03 -0.266090  0.684080  0.187856  1.734062

In [12]: g = df.groupby(pd.TimeGrouper("M"))

Now you can iterate through the GroupBy for each month:

In [13]: for (month_start, sub_df) in g:
   ....:     print(sub_df)
   ....:
                   A         B         C         D
2000-01-03  1.991135  0.045306 -0.657898  0.311375
2000-01-04  0.690848  1.862244  0.709432 -2.080355
2000-01-05  0.602610 -0.205035  1.248848  0.192274
                   A         B         C         D
2000-02-01 -0.108412  0.188484 -0.568542  0.335969
2000-02-02  0.855690 -0.283225  1.471867  0.309235
2000-02-03 -0.266090  0.684080  0.187856  1.734062

Iterate over chunks of dataframe by time period

Answers (1)

Related Questions