Reputation: 1381
I have a df that has a MultiIndex
of (id, date)
and I would like to do 2 things:
convert the DateTimeIndex
named date
to a PeriodIndex
within each id
group
resample
the frequency of the PeriodIndex
to monthly from daily
My current (non-working) method is to (even before converting to PeriodIndex
):
df = pd.DataFrame(data = {"val": np.arange(30),
"id": np.tile([1,2], 15),
"date": np.repeat(pd.date_range(start = "2000-01-01", periods = 15, name="date"), 2)
})
df = df.set_index(["id", "date"]).sort_index()
df.groupby("id")["val"].resample(rule = "M", closed = "right", label = "right").apply(lambda x: np.sqrt(sum(x)/10))
This raises:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'
What's the right way to do the whole procedure? I'm a bit confused about how to think about groupby
: my mental model is that anything that follows a groupby
operation will only receive the subframe corresponding to that group (ie the MultiIndex
becomes a single index of just date
within that particular group). Is this not correct?
Upvotes: 0
Views: 2070
Reputation: 862681
If use DataFrameGroupBy.resample
is necessary DatetimeIndex
set before groupby
, also apply
is not necessary, faster is resample
sum
, then divide final Series
by 10 and then use np.sqrt
:
df = df.set_index(["date"]).sort_index()
df1 = (np.sqrt(df.groupby("id")["val"]
.resample(rule = "M", closed = "right", label = "right")
.sum()
.div(10)))
print (df1)
id date
1 2000-01-31 4.582576
2 2000-01-31 4.743416
Name: val, dtype: float64
Upvotes: 2