Reputation: 25366
I am using some aggregation function after doing the groupby on a pandas dataframe, like:
my_df.groupby(['id']).agg(['count'])
I am wondering is it possible to have a customized aggregation function? For example, in my data frame:
id color
--------------------
001 red
001 blue
001 yellow
002 green
002 black
003 yellow
003 white
003 blue
I want to create a customized function called all_color
, so I could do something like:
my_df.groupby(['id']).agg(['all_color'])
and get the output data frame as:
id all_color
--------------------
001 [red,blue,yellow]
002 [green,black]
003 [yellow,white,blue]
Upvotes: 0
Views: 149
Reputation: 76947
Use apply
function, and tolist()
method to convert to values to list.
In [12]: df.groupby('id')['color'].apply(lambda x: x.tolist())
Out[12]:
id
1 [red, blue, yellow]
2 [green, black]
3 [yellow, white, blue]
Name: color, dtype: object
Use reset_index
to convert the series to dataframe
In [21]: df.groupby('id')['color'].apply(lambda x: x.tolist()).reset_index()
Out[21]:
id color
0 1 [red, blue, yellow]
1 2 [green, black]
2 3 [yellow, white, blue]
Upvotes: 1
Reputation: 375675
If you want this as a dataframe you can use pivot_table
:
In [11]: pd.pivot_table(df, values="id", index=df["id"], columns=df["color"], aggfunc='count', fill_value=0)
Out[11]:
color black blue green red white yellow
id
1 0 1 0 1 0 1
2 1 0 1 0 0 0
3 0 1 0 0 1 1
Note: this is very similar to the output of get_dummies
.
Upvotes: 0