Reputation: 2125
Here's a sample dataframe:
import pandas as pd
df = pd.DataFrame({'ID':[1,1,1,2,2,2,3,3],
'value':[42, 89, 250, 31, 130, 108, 107, 93]})
ID value
0 1 42
1 1 89
2 1 250
3 2 31
4 2 130
5 2 108
6 3 107
7 3 93
For each ID I want to extract the entries with values greater than 100.
Using groupby
I can get the following
grouped = df.groupby('ID')
for name, group in grouped:
print(name, group)
1 ID value
0 1 42
1 1 89
2 1 250
2 ID value
3 2 31
4 2 130
5 2 108
3 ID value
6 3 107
7 3 93
I want to apply a condition to each group to get the following:
1 ID value
2 1 250
2 ID value
4 2 130
5 2 108
3 ID value
6 3 107
I have tried using groupby.filter
but that outputs a boolean condition for an entire group. I want to apply a boolean condition within a group. How can I do this?
EDIT: I should specify that the condition is different within each group, so I need to do groupby first.
Upvotes: 2
Views: 202
Reputation: 7331
You can apply a function:
def f(df, n):
return df[df['value'] > n]
df = pd.DataFrame({'ID':[1,1,1,2,2,2,3,3],
'value':[42, 89, 250, 31, 130, 108, 107, 93]})
res = df.groupby('ID').apply(lambda x: f(x, 100))
Upvotes: 0
Reputation: 863166
You can filter by boolean indexing
before or in loop:
df = pd.DataFrame({'ID':[1,1,1,2,2,2,3,3],
'value':[42, 89, 250, 31, 130, 108, 10, 93]})
print (df)
ID value
0 1 42
1 1 89
2 1 250
3 2 31
4 2 130
5 2 108
6 3 10
7 3 93
If some group have no value matched is omitted like group 3
:
grouped = df[df['value'] > 100].groupby('ID')
for name, group in grouped:
print(name, group)
1 ID value
2 1 250
2 ID value
4 2 130
5 2 108
Or if filter in loop is returned empty DataFrame for not matched group:
grouped = df.groupby('ID')
for name, group in grouped:
print(name, group[group['value'] > 100])
1 ID value
2 1 250
2 ID value
4 2 130
5 2 108
3 Empty DataFrame
Columns: [ID, value]
Index: []
EDIT:
If want filter by values different for each group is possible solution with map
by dictionary with ID
column, then compare with value
and filter by boolean indexing
:
d = {1:100, 2: 121, 3: 10}
df = df[df['value'] > df['ID'].map(d)]
print (df)
ID value
2 1 250
4 2 130
7 3 93
Detail:
print (df['ID'].map(d))
0 100
1 100
2 100
3 121
4 121
5 121
6 10
7 10
Name: ID, dtype: int64
Upvotes: 5
Reputation: 133
We can do grater than query this way in pandas ..
import pandas as pd
df = pd.DataFrame({'ID':[1,1,1,2,2,2,3,3], 'value':[42, 89, 250, 31, 130, 108, 107, 93]})
df3 = df.query('value > 100')
print(df3.head())
Output will be
ID value
2 1 250
4 2 130
5 2 108
6 3 107
Upvotes: 0