berkelem
berkelem

Reputation: 2125

Filter within GroupBy objects in Pandas

Here's a sample dataframe:

import pandas as pd
df = pd.DataFrame({'ID':[1,1,1,2,2,2,3,3], 
                   'value':[42, 89, 250, 31, 130, 108, 107, 93]})

    ID  value
0    1     42
1    1     89
2    1    250
3    2     31
4    2    130
5    2    108
6    3    107
7    3     93

For each ID I want to extract the entries with values greater than 100.

Using groupby I can get the following

grouped = df.groupby('ID')
for name, group in grouped:
    print(name, group)

1    ID  value
0     1     42
1     1     89
2     1    250
2    ID  value
3     2     31
4     2    130
5     2    108
3    ID  value
6     3    107
7     3     93

I want to apply a condition to each group to get the following:

1    ID  value
2     1    250
2    ID  value
4     2    130
5     2    108
3    ID  value
6     3    107

I have tried using groupby.filter but that outputs a boolean condition for an entire group. I want to apply a boolean condition within a group. How can I do this?

EDIT: I should specify that the condition is different within each group, so I need to do groupby first.

Upvotes: 2

Views: 202

Answers (3)

SpiralDev
SpiralDev

Reputation: 7331

You can apply a function:

def f(df, n):
    return df[df['value'] > n]

df = pd.DataFrame({'ID':[1,1,1,2,2,2,3,3],
                   'value':[42, 89, 250, 31, 130, 108, 107, 93]})

res = df.groupby('ID').apply(lambda x: f(x, 100))

Upvotes: 0

jezrael
jezrael

Reputation: 863166

You can filter by boolean indexing before or in loop:

df = pd.DataFrame({'ID':[1,1,1,2,2,2,3,3], 
                   'value':[42, 89, 250, 31, 130, 108, 10, 93]})

print (df)
   ID  value
0   1     42
1   1     89
2   1    250
3   2     31
4   2    130
5   2    108
6   3     10
7   3     93

If some group have no value matched is omitted like group 3:

grouped = df[df['value'] > 100].groupby('ID')
for name, group in grouped:
    print(name, group)
1    ID  value
2   1    250
2    ID  value
4   2    130
5   2    108

Or if filter in loop is returned empty DataFrame for not matched group:

grouped = df.groupby('ID')
for name, group in grouped:
    print(name, group[group['value'] > 100])

1    ID  value
2   1    250
2    ID  value
4   2    130
5   2    108
3 Empty DataFrame
Columns: [ID, value]
Index: []

EDIT:

If want filter by values different for each group is possible solution with map by dictionary with ID column, then compare with value and filter by boolean indexing:

d = {1:100, 2: 121, 3: 10}

df = df[df['value'] > df['ID'].map(d)]
print (df)
   ID  value
2   1    250
4   2    130
7   3     93

Detail:

print (df['ID'].map(d))
0    100
1    100
2    100
3    121
4    121
5    121
6     10
7     10
Name: ID, dtype: int64

Upvotes: 5

giasuddin
giasuddin

Reputation: 133

We can do grater than query this way in pandas ..

import pandas as pd
df = pd.DataFrame({'ID':[1,1,1,2,2,2,3,3], 'value':[42, 89, 250, 31, 130, 108, 107,      93]})
df3 = df.query('value > 100')
print(df3.head())

Output will be

   ID  value
2   1    250
4   2    130
5   2    108
6   3    107

Upvotes: 0

Related Questions