Reputation: 147
I have a dataframe with three columns ID, Col1, and Col2. I want to group the df by ID and Col1 and remove all values of Col1 which are lower than the value in Col2.
import pandas as pd
df = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'B', 'B', 'B']},
{'Col1': [0, 1, 2, 4, 0, 1, 2]},
{"Col2": [1, 1, 1, 1, 2, 2, 2]})
ID Col1 Col2
0 A 0 1
1 A 1 1
2 A 2 1
3 A 4 1
4 B 0 2
5 B 1 2
6 B 2 2
The end result should like this
ID Col1
0 A 1
1 A 2
2 A 4
3 B 2
I can do it by iterating over the dataframe and splitting it into chunks, but there must be a simpler and faster way to this with Pandas functions.
Upvotes: 2
Views: 1239
Reputation: 164773
You can use pd.DataFrame.query
. groupby
is not required since you are not performing an aggregation.
res = df.query('Col1 >= Col2')[['ID', 'Col1']]
print(res)
ID Col1
1 A 1
2 A 2
3 A 4
6 B 2
Upvotes: 2
Reputation: 863226
I think groupby
is not necessary, use boolean indexing
only with loc
for filtering columns by names:
df1 = df.loc[df['Col1'].ge(df['Col2']), ['ID', 'Col1']]
#alternative
#df1 = df.loc[df['Col1'] >= df['Col2'], ['ID', 'Col1']]
print (df1)
ID Col1
1 A 1
2 A 2
3 A 4
6 B 2
Upvotes: 2