wannabedatasth
wannabedatasth

Reputation: 79

Group by column but return dataframe as the initial one - pandas

I have a pandas dataframe looking like this:

id attr1 attr2 attr3 label
123 2 or less category1 other_cat1 2
45 between 3 and 4 category3 other_cat3 7
123 2 or less category1 other_cat1 2
76 5 or more category4 other_cat4 9
45 between 3 and 4 category3 other_cat3 7
123 2 or less category1 other_cat1 1

What I want to have is a resulting dataframe with the same columns but the labels to be grouped by the id counts. I have done this:

df.groupby(['label'])['id'].count().reset_index()

which results of course in the following :

label id
1 1
2 2
7 2
9 1

The goal is to have the following:

id attr1 attr2 attr3 label counts
123 2 or less category1 other_cat1 2 2
45 between 3 and 4 category3 other_cat3 7 2
76 5 or more category4 other_cat4 9 1
123 2 or less category1 other_cat1 1 1

The attr1, attr2, attr3 are always the same per id. Only the label value may differ per id. Thus, my problem can be summarized in the existence of multiple identical rows and I thought that I could group them like that in order to make it more concise.

I can't use the pandas transform method because I don't want the returning dataframe to have the same shape as the initial one. I can't use the pandas aggregate method because I need to have all the columns as the initial dataframe. I also can't think of a situation where I can use the pandas apply method.

I know probably that the answer somehow lies behind the use of one the above methods or some combination of them but I am lacking the experience to think of it. I would appreciate some guidance/help. Thank you.

Upvotes: 0

Views: 38

Answers (1)

Amit Vikram Singh
Amit Vikram Singh

Reputation: 2128

First use .groupby.transform to add count column in the dataframe. Then drop the duplicate rows.

Use:

df['count'] = df.groupby('label')['id'].transform('count')
out = df.drop_duplicates(keep = 'first')

Output:

>>> out
    id            attr1      attr2       attr3  label  count
0  123        2 or less  category1  other_cat1      2      2
1   45  between 3 and 4  category3  other_cat3      7      2
3   76        5 or more  category4  other_cat4      9      1
5  123        2 or less  category1  other_cat1      1      1

Upvotes: 1

Related Questions