calben
calben

Reputation: 1358

Apply resampling to each group in a groupby object

I've created a convenience method to perform resampling on an arbitrary dataframe:

def resample_data_to_hourly(df):
  df = df.resample('1H',how='mean',fill_method='ffill',
                           closed='left',label='left')
  return df

And I would like to apply this function to every dataframe in a groupby object with something like the following:

df.transform(resample_data_to_hourly)
df.aggregate(resample_data_to_hourly)
dfapply(resample_data_to_hourly)

I've tried them all with no success. No matter what I do, no effect is had on the dataframe, even if I set the resulting value of the above to a new dataframe (which, to my understanding, I shouldn't have to do).

I'm sure there is something straightforward and idiomatic about handling groupby objects with time series data that I am missing here, but I haven't been able to correct my program.

How do I create functions like the above and have them properly apply to a groupby object? I can get my code to work if I iterate through each group as in a dictionary and add the results to a new dictionary which I can then convert back into a groupby object, but this is terribly hacky and I feel like I'm missing out on a lot of what Pandas can do because I'm forced into these hacky methods.

EDIT ADDING BASE EXAMPLE:

rng = pd.date_range('1/1/2000', periods=10, freq='10m')
df = pd.DataFrame({'a':pd.Series(randn(len(rng)), index=rng), 'b':pd.Series(randn(len(rng)), index=rng)})

yields:

                       a         b
    2000-01-31  0.168622  0.539533
    2000-11-30 -0.283783  0.687311
    2001-09-30 -0.266917 -1.511838
    2002-07-31 -0.759782 -0.447325
    2003-05-31 -0.110677  0.061783
    2004-03-31  0.217771  1.785207
    2005-01-31  0.450280  1.759651
    2005-11-30  0.070834  0.184432
    2006-09-30  0.254020 -0.895782
    2007-07-31 -0.211647 -0.072757

df.groupby('a').transform(hour_resample) // should yield resampled data with both a and b columns
// instead yields only column b
// df.apply yields both columns but in this case no changes will be made to the actual matrix
// (though in this case no change would be made, sample data could be generated such that a change should be made)
// if someone could supply a reliable way to generate data that can be resampled, that would be wonderful

Upvotes: 1

Views: 1395

Answers (1)

Fil
Fil

Reputation: 1834

data.groupby(level=0)
    .apply(lambda d: d.reset_index(level=0, drop=True)
                      .resample("M", how=""))

Upvotes: 3

Related Questions