Reputation: 79
With Pandas groupby, I can do things like this:
>>> df = pd.DataFrame(
... {
... "A": ["foo", "bar", "bar", "foo", "bar"],
... "B": ["one", "two", "three", "four", "five"],
... }
... )
>>> print(df)
A B
0 foo one
1 bar two
2 bar three
3 foo four
4 bar five
>>> print(df.groupby('A')['B'].unique())
A
bar [two, three, five]
foo [one, four]
Name: B, dtype: object
What I am looking for is output that produces a list of indices instead of a list of column B:
A
bar [1, 2, 4]
foo [0, 3]
However, groupby('A').index.unique() doesn't work. What syntax would provide me the output I'm after? I'd be more than happy to do this in some other way than with groupby, although I do need to group by two columns in my real application.
Upvotes: 7
Views: 2741
Reputation: 260480
You do not necessarily need to have a label in groupby
, you can use a grouping object.
This enables things like:
df.index.to_series().groupby(df['A']).unique()
output:
A
bar [1, 2, 4]
foo [0, 3]
dtype: object
df[~df[['A', 'B']].duplicated()].index.to_series().groupby(df['A']).unique()
Upvotes: 4
Reputation: 8219
If you want indices of unique values in 'B', as opposed to unique indices, then you can do
df.reset_index().groupby('A').apply(lambda g: g.drop_duplicates(['B'])['index'].tolist())
it is different from @Mayank and @mozway answers when applied to a slightly modified example df:
df = pd.DataFrame(
{
"A": ["foo", "bar", "bar", "foo", "bar", "foo"],
"B": ["one", "two", "three", "four", "five", "one"],
}
)
My answer would return
A
bar [1, 2, 4]
foo [0, 3]
dtype: object
whereas @Mayank and @mozway would return
A
bar [1, 2, 4]
foo [0, 3, 5]
Name: index, dtype: object
Upvotes: 2
Reputation: 2983
One intuitive way is to add a line for defining a new column as index, and you can keep using the same code as you wrote.
df['index'] = df.index
df.groupby('A')['index'].unique()
Result:
Upvotes: 0
Reputation: 34046
Use df.reset_index
with Groupby.Series.unique
In [530]: df.reset_index().groupby('A')['index'].unique()
Out[530]:
A
bar [1, 2, 4]
foo [0, 3]
Name: index, dtype: object
OR:
In [533]: df.reset_index().groupby('A')['index'].agg(list)
Out[533]:
A
bar [1, 2, 4]
foo [0, 3]
Name: index, dtype: object
Upvotes: 2