Python - Pandas data frame: customized aggregation function after groupy?

Question

I am using some aggregation function after doing the groupby on a pandas dataframe, like:

my_df.groupby(['id']).agg(['count'])

I am wondering is it possible to have a customized aggregation function? For example, in my data frame:

id       color 
--------------------     
001       red
001       blue
001       yellow
002       green
002       black
003       yellow
003       white
003       blue

I want to create a customized function called all_color, so I could do something like:

my_df.groupby(['id']).agg(['all_color'])

and get the output data frame as:

id        all_color
--------------------
001       [red,blue,yellow]
002       [green,black]
003       [yellow,white,blue]

Zero · Accepted Answer

Use apply function, and tolist() method to convert to values to list.

In [12]: df.groupby('id')['color'].apply(lambda x: x.tolist())
Out[12]:
id
1      [red, blue, yellow]
2           [green, black]
3    [yellow, white, blue]
Name: color, dtype: object

Use reset_index to convert the series to dataframe

In [21]: df.groupby('id')['color'].apply(lambda x: x.tolist()).reset_index()
Out[21]:
   id                  color
0   1    [red, blue, yellow]
1   2         [green, black]
2   3  [yellow, white, blue]

Python - Pandas data frame: customized aggregation function after groupy?

Answers (2)

Related Questions