Reputation: 145
If I do a groupby() followed by a rolling() calculation with a multi-level index, one of the levels in the index is repeated - most odd. I am using Pandas 0.18.1
import pandas as pd
df = pd.DataFrame(data=[[1, 1, 10, 20], [1, 2, 30, 40], [1, 3, 50, 60],
[2, 1, 11, 21], [2, 2, 31, 41], [2, 3, 51, 61]],
columns=['id', 'date', 'd1', 'd2'])
df.set_index(['id', 'date'], inplace=True)
df = df.groupby(level='id').rolling(window=2)['d1'].sum()
print(df)
print(df.index)
The output is as follows
id id date
1 1 1 NaN
2 40.0
3 80.0
2 2 1 NaN
2 42.0
3 82.0
Name: d1, dtype: float64
MultiIndex(levels=[[1, 2], [1, 2], [1, 2, 3]],
labels=[[0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
names=[u'id', u'id', u'date'])
What is odd is that the id column now shows up twice in the multi-index. Moving the ['d1'] column selection around doesn't make any difference.
Any help would be much appreciated.
Thanks Paul
Upvotes: 7
Views: 2752
Reputation: 4254
You can also try droplevel:
df = df.groupby(level="id").rolling(window=2).sum().droplevel(0)
Upvotes: 0
Reputation: 12406
With pandas==1.1.1
, it looks like this can also be done without .apply
Using .apply
method1 = test_df.groupby(level="id").d1.apply(lambda x: x.rolling(window=2).sum())
print(method1)
id date
1 1 NaN
2 40.0
3 80.0
2 1 NaN
2 42.0
3 82.0
Name: d1, dtype: float64
Without using .apply
method2 = test_df.groupby(level="id").d1.rolling(window=2).sum()
print(method2)
id date
1 1 NaN
2 40.0
3 80.0
2 1 NaN
2 42.0
3 82.0
Name: d1, dtype: float64
try:
np.testing.assert_array_equal(method1.to_numpy(), method2.to_numpy())
print("Matching outputs")
except AssertionError as err:
print("MisMatching outputs")
Result of checking equality
Matching outputs
Upvotes: 0