cck1110
cck1110

Reputation: 55

Pandas filter rows based on condition, but always retain the first row

I would like to drop some rows that meets certain conditions but I do not want to drop the first row even if the first row meets that criteria.

I tried dropping rows by using the df.drop function but it will erase the first row if the first row meets that condition. I do not want that.

Data looks something like this:

Column1 Column2 Column3
  1        3      A
  2        1      B
  3        3      C
  4        1      D
  5        1      E
  6        3      F

I want to do it in a way that if a row has a value of 3 in column2 then drop it.

And I want the new data to be like this (after dropping but keeping the first one even though the first row had a value of 3 in column 2):

Column1 Column2 Column3
  1        3      A
  2        1      B
  4        1      D
  5        1      E

Upvotes: 3

Views: 1862

Answers (3)

cs95
cs95

Reputation: 402513

You can make "retain first row" a part of your condition for dropping/keeping rows.

The condition for keeping the rows would be <ORIGINAL CONDITION> or <CONDITION TO KEEP FIRST ROW>. In code, this is

# (condition to drop 3) | (condition to keep 0th row)
df[(df['Column2'] != 3) | (df.index == 0)]

   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

Inversely, using DeMorgan's Laws, the condition for dropping rows would be (df['Column2'] == 3) & (df.index != 0). We then invert the condition to get our expected output,

df[~((df['Column2'] == 3) & (df.index != 0))]

   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

These work assuming your index is a RangeIndex. If not, use pd.RangeIndex(len(df)) == 0 as the second condition instead.

Upvotes: 2

Erfan
Erfan

Reputation: 42916

Already great answers posted here, but just for your convenience. We can also use cumcount for this to rank Nth 3 we find:

df = df[~((df.groupby('Column2').Column2.cumcount() != 0) & (df.Column2 == 3))]

print(df)
   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

Upvotes: 2

BENY
BENY

Reputation: 323236

I am using duplicated

df[(~df.Column2.duplicated())|df.Column2.ne(3)]
   Column1  Column2 Column3
0        1        3       A
1        2        1       B
3        4        1       D
4        5        1       E

Upvotes: 2

Related Questions