Alex
Alex

Reputation: 1381

resample a pandas df within each group

I have a df that has a MultiIndex of (id, date) and I would like to do 2 things:

  1. convert the DateTimeIndex named date to a PeriodIndex within each id group

  2. resample the frequency of the PeriodIndex to monthly from daily

My current (non-working) method is to (even before converting to PeriodIndex):

df = pd.DataFrame(data = {"val": np.arange(30), 
                          "id": np.tile([1,2], 15),
                          "date": np.repeat(pd.date_range(start = "2000-01-01", periods = 15, name="date"), 2)
                         })

df = df.set_index(["id", "date"]).sort_index() 
df.groupby("id")["val"].resample(rule = "M", closed = "right", label = "right").apply(lambda x: np.sqrt(sum(x)/10))

This raises:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'

What's the right way to do the whole procedure? I'm a bit confused about how to think about groupby: my mental model is that anything that follows a groupby operation will only receive the subframe corresponding to that group (ie the MultiIndex becomes a single index of just date within that particular group). Is this not correct?

Upvotes: 0

Views: 2070

Answers (1)

jezrael
jezrael

Reputation: 862681

If use DataFrameGroupBy.resample is necessary DatetimeIndex set before groupby, also apply is not necessary, faster is resample sum, then divide final Series by 10 and then use np.sqrt:

df = df.set_index(["date"]).sort_index() 

df1 = (np.sqrt(df.groupby("id")["val"]
                 .resample(rule = "M", closed = "right", label = "right")
                 .sum()
                 .div(10)))

print (df1)
id  date      
1   2000-01-31    4.582576
2   2000-01-31    4.743416
Name: val, dtype: float64

Upvotes: 2

Related Questions