Reputation: 3305
Here is a sample dataset
test = pd.DataFrame({
'a' : [1, 2, 3]*2,
'b' : ['a', 'a', 'b', 'b', 'b', 'b',],
'c' : [123, 456, 456, 123, 456, 123]
})
print(test)
a b c
0 1 a 123
1 2 a 456
2 3 b 456
3 1 b 123
4 2 b 456
5 3 b 123
If I groupby
columns 'a'
and 'b'
and then try to get a list of unique values ('c'
) in each group, I don't get the expected results using transform
# using transform
print(test.groupby([
'a',
'b',
]).c.transform(pd.Series.unique))
0 123
1 456
2 456
3 123
4 456
5 123
If I use unique
instead, I almost get the expected output:
# almost expected output
print(test.groupby([
'a',
'b',
]).c.unique())
a b
1 a [123]
b [123]
2 a [456]
b [456]
3 b [456, 123]
Name: c, dtype: object
What I was hoping for was a pd.Series
that looks like this using transform
:
0 [123]
1 [456]
2 [456, 123]
3 [123]
4 [456]
5 [456, 123]
dtype: object
I know that I can use transform
to get the nunique
values of 'c'
as a series doing this:
print(test.groupby([
'a',
'b',
]).c.transform(pd.Series.nunique))
0 1
1 1
2 2
3 1
4 1
5 2
Name: c, dtype: int64
Why can't I do something similar with unique
and transform
?
I know that I can do the groupby
and unique
and then reset_index
and merge
with the original data, but I'm hoping for a more pythonic/pandas-friendly method.
I also tried using set
and transform
, but that returned an error.
print(test.groupby([
'a',
'b',
]).c.transform(set))
TypeError: 'set' type is unordered
Upvotes: 2
Views: 213
Reputation: 150785
Does
test.groupby(['a','b'])['c'].transform('unique')
work for you?
Output:
0 [123]
1 [456]
2 [456, 123]
3 [123]
4 [456]
5 [456, 123]
Name: c, dtype: object
Upvotes: 3