Reputation: 127
I have the following multi-indexed DataFrame:
df = pd.DataFrame({
'cluster': [1, 1, 2, 1, 2, 2, 1, 3, 2],
'mark': [8, 5, 10, 20, 4, 1, 6, 4, 1],
'dt': ['2021-03-23', '2021-03-25', '2021-03-23', '2021-03-28', '2021-03-25', '2021-03-28', '2021-03-29', '2021-03-23', '2021-03-31']
})
df.set_index(['cluster', 'dt'], inplace=True)
df.sort_index(inplace=True)
df
mark cluster dt 1 2021-03-23 8 2021-03-25 5 2021-03-28 20 2021-03-29 6 2 2021-03-23 10 2021-03-25 4 2021-03-28 1 2021-03-31 1 3 2021-03-23 4
What I am looking to do is to generate the sum over the weekly subgroups within each indexed group, something like
total mark cluster start_date_of_week 1 2021-03-21 13 2021-03-28 26 2 2021-03-21 14 2021-03-28 2 3 2021-03-23 4
I know how to generate groups based on frequency of a datetime index. I don't know how to do this using one level of a multi-index. Any ideas?
Upvotes: 3
Views: 163
Reputation: 195478
Make sure the the level 1 of the index is of type datetime. Then you can do:
print(
df.groupby(
[
pd.Grouper(level=0),
pd.Grouper(level=1, freq="W", label="left", closed="left"),
]
)["mark"]
.sum()
.to_frame()
)
Prints:
mark
cluster dt
1 2021-03-21 13
2021-03-28 26
2 2021-03-21 14
2021-03-28 2
3 2021-03-21 4
Upvotes: 1