1nfern0
1nfern0

Reputation: 147

Python Pandas group by and exclude values based on another column

I have a dataframe with three columns ID, Col1, and Col2. I want to group the df by ID and Col1 and remove all values of Col1 which are lower than the value in Col2.

import pandas as pd

df = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'B', 'B', 'B']},
                  {'Col1': [0, 1, 2, 4, 0, 1, 2]},
                  {"Col2": [1, 1, 1, 1, 2, 2, 2]})

  ID  Col1  Col2
0  A     0     1
1  A     1     1
2  A     2     1
3  A     4     1
4  B     0     2
5  B     1     2
6  B     2     2

The end result should like this

  ID  Col1
0  A     1
1  A     2
2  A     4
3  B     2

I can do it by iterating over the dataframe and splitting it into chunks, but there must be a simpler and faster way to this with Pandas functions.

Upvotes: 2

Views: 1239

Answers (2)

jpp
jpp

Reputation: 164773

You can use pd.DataFrame.query. groupby is not required since you are not performing an aggregation.

res = df.query('Col1 >= Col2')[['ID', 'Col1']]

print(res)

  ID  Col1
1  A     1
2  A     2
3  A     4
6  B     2

Upvotes: 2

jezrael
jezrael

Reputation: 863226

I think groupby is not necessary, use boolean indexing only with loc for filtering columns by names:

df1 = df.loc[df['Col1'].ge(df['Col2']), ['ID', 'Col1']]
#alternative
#df1 = df.loc[df['Col1'] >= df['Col2'], ['ID', 'Col1']]

print (df1)

  ID  Col1
1  A     1
2  A     2
3  A     4
6  B     2

Upvotes: 2

Related Questions