Reputation: 3883
I have a grouped dataframe:
df = pd.DataFrame({'a': [0, 0, 1, 1, 2], 'b': range(5)})
g = df.groupby('a')
for key, gr in g:
print(gr, '\n')
a b
0 0 0
1 0 1
a b
2 1 2
3 1 3
a b
4 2 4
I want to do a computation that needs each group and its next one (except the last group, of course).
So with this example I want to get two pairs:
# First pair:
a b
0 0 0
1 0 1
a b
2 1 2
3 1 3
# Second pair:
a b
2 1 2
3 1 3
a b
4 2 4
If the groups were in a list instead, this would be easy:
for x, x_next in zip(lst[], lst[1:]):
...
But unfortunately, selecting a slice doesn't work with a pd.DataFrameGroupBy
object:
g[1:] # TypeError: unhashable type: 'slice'. (It thinks I want to access the column by its name.)
g.iloc[1:] # AttributeError: 'DataFrameGroupBy' object has no attribute 'iloc'
This question is related but it doesn't answer my question.
I am posting an answer myself, but maybe there are better or more efficient solutions (maybe pandas-native?).
Upvotes: 0
Views: 343
Reputation: 3883
You can convert a pd.DataFrameGroupBy
to a list that contains all groups (in tuples: grouping value and a group),
and then iterate over this list:
lst = list(g)
for current, next_one in zip(lst[], lst[1:]):
...
Alternatively, create an iterator, and skip its first value:
it = iter(g)
next(it)
for current, next_one in zip(g, it):
...
A more complicated way:
g.groups
returns a dictionary where keys are the unique values of your grouping column, and values are
the groups. Then you can try to iterate over a dictionary, but I think it would be unnecessarily complicated.
Upvotes: 1