Pandas Multi-index DataFrame: aggregate sub-groups within groups

Question

I have the following multi-indexed DataFrame:

df = pd.DataFrame({
      'cluster': [1, 1, 2, 1, 2, 2, 1, 3, 2],
      'mark': [8, 5, 10, 20, 4, 1, 6, 4, 1],
      'dt': ['2021-03-23', '2021-03-25', '2021-03-23', '2021-03-28', '2021-03-25', '2021-03-28', '2021-03-29', '2021-03-23', '2021-03-31']
})
df.set_index(['cluster', 'dt'], inplace=True)
df.sort_index(inplace=True)
df


                                mark
cluster     dt  
1           2021-03-23          8
            2021-03-25          5
            2021-03-28          20
            2021-03-29          6
2           2021-03-23          10
            2021-03-25          4
            2021-03-28          1
            2021-03-31          1
3           2021-03-23          4

What I am looking to do is to generate the sum over the weekly subgroups within each indexed group, something like


                                total mark
cluster     start_date_of_week  
1           2021-03-21          13
            2021-03-28          26
2           2021-03-21          14
            2021-03-28          2
3           2021-03-23          4

I know how to generate groups based on frequency of a datetime index. I don't know how to do this using one level of a multi-index. Any ideas?

Andrej Kesely · Accepted Answer

Make sure the the level 1 of the index is of type datetime. Then you can do:

print(
    df.groupby(
        [
            pd.Grouper(level=0),
            pd.Grouper(level=1, freq="W", label="left", closed="left"),
        ]
    )["mark"]
    .sum()
    .to_frame()
)

Prints:

                    mark
cluster dt              
1       2021-03-21    13
        2021-03-28    26
2       2021-03-21    14
        2021-03-28     2
3       2021-03-21     4

Pandas Multi-index DataFrame: aggregate sub-groups within groups

Answers (1)

Related Questions