Edamame
Edamame

Reputation: 25366

Python - Pandas data frame: customized aggregation function after groupy?

I am using some aggregation function after doing the groupby on a pandas dataframe, like:

my_df.groupby(['id']).agg(['count'])

I am wondering is it possible to have a customized aggregation function? For example, in my data frame:

id       color 
--------------------     
001       red
001       blue
001       yellow
002       green
002       black
003       yellow
003       white
003       blue

I want to create a customized function called all_color, so I could do something like:

my_df.groupby(['id']).agg(['all_color'])

and get the output data frame as:

id        all_color
--------------------
001       [red,blue,yellow]
002       [green,black]
003       [yellow,white,blue]

Upvotes: 0

Views: 149

Answers (2)

Zero
Zero

Reputation: 76947

Use apply function, and tolist() method to convert to values to list.

In [12]: df.groupby('id')['color'].apply(lambda x: x.tolist())
Out[12]:
id
1      [red, blue, yellow]
2           [green, black]
3    [yellow, white, blue]
Name: color, dtype: object

Use reset_index to convert the series to dataframe

In [21]: df.groupby('id')['color'].apply(lambda x: x.tolist()).reset_index()
Out[21]:
   id                  color
0   1    [red, blue, yellow]
1   2         [green, black]
2   3  [yellow, white, blue]

Upvotes: 1

Andy Hayden
Andy Hayden

Reputation: 375675

If you want this as a dataframe you can use pivot_table:

In [11]: pd.pivot_table(df, values="id", index=df["id"], columns=df["color"], aggfunc='count', fill_value=0)
Out[11]:
color  black  blue  green  red  white  yellow
id
1          0     1      0    1      0       1
2          1     0      1    0      0       0
3          0     1      0    0      1       1

Note: this is very similar to the output of get_dummies.

Upvotes: 0

Related Questions