Python Pandas group by and exclude values based on another column

Question

I have a dataframe with three columns ID, Col1, and Col2. I want to group the df by ID and Col1 and remove all values of Col1 which are lower than the value in Col2.

import pandas as pd

df = pd.DataFrame({'ID': ['A', 'A', 'A', 'A', 'B', 'B', 'B']},
                  {'Col1': [0, 1, 2, 4, 0, 1, 2]},
                  {"Col2": [1, 1, 1, 1, 2, 2, 2]})

  ID  Col1  Col2
0  A     0     1
1  A     1     1
2  A     2     1
3  A     4     1
4  B     0     2
5  B     1     2
6  B     2     2

The end result should like this

I can do it by iterating over the dataframe and splitting it into chunks, but there must be a simpler and faster way to this with Pandas functions.

jezrael · Accepted Answer

I think groupby is not necessary, use boolean indexing only with loc for filtering columns by names:

df1 = df.loc[df['Col1'].ge(df['Col2']), ['ID', 'Col1']]
#alternative
#df1 = df.loc[df['Col1'] >= df['Col2'], ['ID', 'Col1']]

print (df1)

  ID  Col1
1  A     1
2  A     2
3  A     4
6  B     2

Python Pandas group by and exclude values based on another column

Answers (2)

Related Questions