Fizi
Fizi

Reputation: 1851

pandas: groupby and aggregate without losing the column which was grouped

I have a pandas dataframe as below. For each 'Id' I can have multiple 'Names' and 'Sub-ids'.

Id      NAME   SUB_ID
276956  A      5933
276956  B      5934
276956  C      5935
287266  D      1589

I want to condense the dataframe such that there is only one row for each 'Id' and all the 'names' and 'sub_ids' under each 'Id' appear as a single set on that row:

Id      NAME           SUB_ID
276956  set(A,B,C)     set(5933,5934,5935)
287266  set(D)         set(1589) 

I tried to groupby 'Id' and then aggregate over all the other columns:

df.groupby('Id').agg(lambda x: set(x))

But in doing so the resulting dataframe does not have the 'Id' column. When you do groupby the 'Id' is returned as the first value of the tuple but I guess when you aggregate that is lost. Is there a way to get the dataframe that I am looking for: to groupby and aggregate without losing the column which was grouped.

Upvotes: 29

Views: 41607

Answers (2)

Zeugma
Zeugma

Reputation: 32095

If you don't want the groupby as an index, there is an argument for it to avoid further reset:

df.groupby('Id', as_index=False).agg(lambda x: set(x))

Upvotes: 33

chrisaycock
chrisaycock

Reputation: 37930

The groupby column becomes the index. You can simply reset the index to get it back:

In [4]: df.groupby('Id').agg(lambda x: set(x)).reset_index()
Out[4]: 
       Id       NAME              SUB_ID
0  276956  {A, C, B}  {5933, 5934, 5935}
1  287266        {D}              {1589}

Upvotes: 15

Related Questions