user2808117
user2808117

Reputation: 4877

How to find subgroups statistics in pandas?

I am grouping a DataFrame using multiple columns (e.g., columns A, B -> my_df.groupby(['A','B']) ), is there a better (less lines of code, faster) way of finding how many rows are in each subgroup and how many subgroups are there in total? at the moment I am using:

def get_grp_size(grp):
    grp['size'] = len(grp)
    return grp
my_df = my_df.groupby(['A','B']).apply(get_grp_size)
my_df[['A','B','size']].drop_duplicates().size

Upvotes: 2

Views: 2761

Answers (1)

roman
roman

Reputation: 117345

my_df.groupby(['A', 'B']).count()
len(my_df.groupby(['A', 'B']).groups)

to add column with counts you can use transform:

df["size"] = df.groupby(['A', 'B']).transform(len)

Upvotes: 1

Related Questions