Reputation: 21574
Starting from this dataframe df:
df = pd.DataFrame({'c':[1,1,1,2,2,2],'l1':['a','a','b','c','c','b'],'l2':['b','d','d','f','e','f']})
c l1 l2
0 1 a b
1 1 a d
2 1 b d
3 2 c f
4 2 c e
5 2 b f
I would like to perform a groupby over the c
column to get unique values of the l1
and l2
columns. For one columns I can do:
g = df.groupby('c')['l1'].unique()
that correctly returns:
c
1 [a, b]
2 [c, b]
Name: l1, dtype: object
but using:
g = df.groupby('c')['l1','l2'].unique()
returns:
AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'
I know I can get the unique values for the two columns with (among others):
In [12]: np.unique(df[['l1','l2']])
Out[12]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype=object)
Is there a way to apply this method to the groupby in order to get something like:
c
1 [a, b, d]
2 [c, b, e, f]
Name: l1, dtype: object
Upvotes: 59
Views: 141394
Reputation: 17911
A shorter version without the lambda function:
df.groupby('c').apply(np.unique)
# or df.groupby('c')['l1','l2'].apply(np.unique)
Output:
c
1 [a, b, d]
2 [b, c, e, f]
dtype: object
Upvotes: 0
Reputation: 20689
One more alternative is to use GroupBy.agg
with set
df.groupby('c').agg(set)
l1 l2
c
1 {a, b} {d, b}
2 {c, b} {e, f}
Upvotes: 16
Reputation: 12168
Alternatively, you can use agg
:
g = df.groupby('c')['l1','l2'].agg(['unique'])
Upvotes: 69
Reputation:
You can do it with apply
:
import numpy as np
g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))
Upvotes: 62