Konstantin
Konstantin

Reputation: 2629

Slicing a DataFrameGroupBy object

Is there a way to slice a DataFrameGroupBy object?

For example, if I have:

df = pd.DataFrame({'A': [2, 1, 1, 3, 3], 'B': ['x', 'y', 'z', 'r', 'p']})

   A  B
0  2  x
1  1  y
2  1  z
3  3  r
4  3  p

dfg = df.groupby('A')

Now, the returned GroupBy object is indexed by values from A, and I would like to select a subset of it, e.g. to perform aggregation. It could be something like

dfg.loc[1:2].agg(...)

or, for a specific column,

dfg['B'].loc[1:2].agg(...)

EDIT. To make it more clear: by slicing the GroupBy object I mean accessing only a subset of groups. In the above example, the GroupBy object will contain 3 groups, for A = 1, A = 2, and A = 3. For some reasons, I may only be interested in groups for A = 1 and A = 2.

Upvotes: 10

Views: 10188

Answers (2)

Anne
Anne

Reputation: 593

If I understand correctly, you only want some groups, but those are supposed to be returned completely:

    A   B
1   1   y
2   1   z
0   2   x

You can solve your problem by extracting the keys and then selecting groups based on those keys.

Assuming you already know the groups:

pd.concat([dfg.get_group(1),dfg.get_group(2)])

If you don't know the group names and are just looking for random n groups, this might work:

pd.concat([dfg.get_group(n) for n in list(dict(list(dfg)).keys())[:2]])

The output in both cases is a normal DataFrame, not a DataFrameGroupBy object, so it might be smarter to first filter your DataFrame and only aggregate afterwards:

df[df['A'].isin([1,2])].groupby('A')

The same for unknown groups:

df[df['A'].isin(list(set(df['A']))[:2])].groupby('A')

I believe there are some Stackoverflow answers refering to this, like How to access pandas groupby dataframe by key

Upvotes: 1

jezrael
jezrael

Reputation: 862741

It seesm you need custom function with iloc - but if use agg is necessary return aggregate value:

df = df.groupby('A')['B'].agg(lambda x: ','.join(x.iloc[0:3]))
print (df)
A
1    y,z
2      x
3    r,p
Name: B, dtype: object

df = df.groupby('A')['B'].agg(lambda x: ','.join(x.iloc[1:3]))
print (df)
A
1    z
2     
3    p
Name: B, dtype: object

For multiple columns:

df = pd.DataFrame({'A': [2, 1, 1, 3, 3], 
                   'B': ['x', 'y', 'z', 'r', 'p'], 
                   'C': ['g', 'y', 'y', 'u', 'k']})
print (df)
   A  B  C
0  2  x  g
1  1  y  y
2  1  z  y
3  3  r  u
4  3  p  k

df = df.groupby('A').agg(lambda x: ','.join(x.iloc[1:3]))
print (df)
   B  C
A      
1  z  y
2      
3  p  k

Upvotes: 3

Related Questions