Vladimir Fokow
Vladimir Fokow

Reputation: 3883

How to iterate over pairs: a group and its next group?

I have a grouped dataframe:

df = pd.DataFrame({'a': [0, 0, 1, 1, 2], 'b': range(5)})
g = df.groupby('a')
for key, gr in g:
    print(gr, '\n')

   a  b
0  0  0
1  0  1

   a  b
2  1  2
3  1  3

   a  b
4  2  4

I want to do a computation that needs each group and its next one (except the last group, of course).

So with this example I want to get two pairs:

# First pair:
   a  b
0  0  0
1  0  1

   a  b
2  1  2
3  1  3


# Second pair:
   a  b
2  1  2
3  1  3

   a  b
4  2  4

My attempt

If the groups were in a list instead, this would be easy:

for x, x_next in zip(lst[], lst[1:]):
    ...

But unfortunately, selecting a slice doesn't work with a pd.DataFrameGroupBy object:

g[1:]       # TypeError: unhashable type: 'slice'. (It thinks I want to access the column by its name.)
g.iloc[1:]  # AttributeError: 'DataFrameGroupBy' object has no attribute 'iloc'

This question is related but it doesn't answer my question.

I am posting an answer myself, but maybe there are better or more efficient solutions (maybe pandas-native?).

Upvotes: 0

Views: 343

Answers (1)

Vladimir Fokow
Vladimir Fokow

Reputation: 3883

You can convert a pd.DataFrameGroupBy to a list that contains all groups (in tuples: grouping value and a group), and then iterate over this list:

lst = list(g)
for current, next_one in zip(lst[], lst[1:]):
    ...

Alternatively, create an iterator, and skip its first value:

it = iter(g)
next(it)
for current, next_one in zip(g, it):
    ...

A more complicated way:

g.groups returns a dictionary where keys are the unique values of your grouping column, and values are the groups. Then you can try to iterate over a dictionary, but I think it would be unnecessarily complicated.

Upvotes: 1

Related Questions