zyxxyz
zyxxyz

Reputation: 79

Include indices in Pandas groupby results

With Pandas groupby, I can do things like this:

>>> df = pd.DataFrame(
...     {
...         "A": ["foo", "bar", "bar", "foo", "bar"],
...         "B": ["one", "two", "three", "four", "five"],
...     }
... )
>>> print(df)
     A      B
0  foo    one
1  bar    two
2  bar  three
3  foo   four
4  bar   five
>>> print(df.groupby('A')['B'].unique())
A
bar    [two, three, five]
foo           [one, four]
Name: B, dtype: object

What I am looking for is output that produces a list of indices instead of a list of column B:

A
bar    [1, 2, 4]
foo    [0, 3]

However, groupby('A').index.unique() doesn't work. What syntax would provide me the output I'm after? I'd be more than happy to do this in some other way than with groupby, although I do need to group by two columns in my real application.

Upvotes: 7

Views: 2741

Answers (4)

mozway
mozway

Reputation: 260480

You do not necessarily need to have a label in groupby, you can use a grouping object.

This enables things like:

df.index.to_series().groupby(df['A']).unique()

output:

A
bar    [1, 2, 4]
foo       [0, 3]
dtype: object
getting the indices of the unique B values:
df[~df[['A', 'B']].duplicated()].index.to_series().groupby(df['A']).unique()

Upvotes: 4

piterbarg
piterbarg

Reputation: 8219

If you want indices of unique values in 'B', as opposed to unique indices, then you can do

df.reset_index().groupby('A').apply(lambda g: g.drop_duplicates(['B'])['index'].tolist())

it is different from @Mayank and @mozway answers when applied to a slightly modified example df:

df = pd.DataFrame(
    {
        "A": ["foo", "bar", "bar", "foo", "bar", "foo"],
        "B": ["one", "two", "three", "four", "five", "one"],
    }
)

My answer would return

A
bar    [1, 2, 4]
foo       [0, 3]
dtype: object

whereas @Mayank and @mozway would return

A
bar    [1, 2, 4]
foo    [0, 3, 5]
Name: index, dtype: object

Upvotes: 2

Ka Wa Yip
Ka Wa Yip

Reputation: 2983

One intuitive way is to add a line for defining a new column as index, and you can keep using the same code as you wrote.

df['index'] = df.index
df.groupby('A')['index'].unique()

Result:

enter image description here

Upvotes: 0

Mayank Porwal
Mayank Porwal

Reputation: 34046

Use df.reset_index with Groupby.Series.unique

In [530]: df.reset_index().groupby('A')['index'].unique()
Out[530]: 
A
bar    [1, 2, 4]
foo       [0, 3]
Name: index, dtype: object

OR:

In [533]: df.reset_index().groupby('A')['index'].agg(list)
Out[533]: 
A
bar    [1, 2, 4]
foo       [0, 3]
Name: index, dtype: object

Upvotes: 2

Related Questions