ysl
ysl

Reputation: 51

Delete rows if combination of two column values is satisfied in python

I've below data: I need to delete all rows which have a combination of (ColX='F' and ColY='Y'). I'm trying the below different codes as suggested by below answers.

  1. df = df[(df.ColX != 'F')&(df.ColY !='Y')] But this code is removing all rows with (ColX=P and ColY=Y) along with (ColX=F and ColY=Y). I need rows with (ColX=P and ColY=Y) in my data.

  2. mask = (df.ColX == 'F') & (df.ColY == 'Y') df[~mask] This code is not removing any of the rows which I want to remove.

    3.mask = (df.ColX == 'F') | (df.ColY == 'Y') df[~mask] This code is removing all rows with (ColX=P and ColY=Y) along with (ColX=F and ColY=Y). I need rows with (ColX=P and ColY=Y) in my data.

Any inputs are highly appreciated. Thank you!

 line_date      ColX       ColY    ColZ   rating        rw    

 2007-03-31       P         Y        56     1.000000  56.000000
 2007-03-10       P         Y        67     1.000000  67.000000
 2007-02-10       F         Y        66     1.000000  66.000000
 2007-01-13       2                  83     0.880678  73.096278
 2006-12-23       2         Y        88     0.793033  69.786942
 2006-11-09       F                  52     0.636655  33.106077
 2006-10-22       P                  66     0.581946  38.408408
 2006-09-29       F         Y        70     0.518825  36.317752

Upvotes: 2

Views: 2579

Answers (3)

ysl
ysl

Reputation: 51

The below code gave me my expected dataset:

filter_df=df[ ~(df['ColY'].str.contains("Y") & df['ColX'].str.contains("F")) ]

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153510

Try this:

mask = (df.ColX == 'F') & (df.ColY == 'Y')
df[~mask]

Output:

    line_date ColX ColY  ColZ    rating         rw
0  2007-03-31    P    Y    56  1.000000  56.000000
1  2007-03-10    P    Y    67  1.000000  67.000000
3  2007-01-13    2  NaN    83  0.880678  73.096278
4  2006-12-23    2    Y    88  0.793033  69.786942
5  2006-11-09    F  NaN    52  0.636655  33.106077
6  2006-10-22    P  NaN    66  0.581946  38.408408

Or, because I am a little confused what your output should be.

mask = (df.ColX == 'F') | (df.ColY == 'Y')
df[~mask]

Output:

    line_date ColX ColY  ColZ    rating         rw
3  2007-01-13    2  NaN    83  0.880678  73.096278
6  2006-10-22    P  NaN    66  0.581946  38.408408

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210932

you can use DataFrame.query() method:

In [20]: df.query("ColX != 'F' or ColY != 'Y'")
Out[20]:
    line_date ColX ColY  ColZ    rating         rw
0  2007-03-31    P    Y    56  1.000000  56.000000
1  2007-03-10    P    Y    67  1.000000  67.000000
3  2007-01-13    2  NaN    83  0.880678  73.096278
4  2006-12-23    2    Y    88  0.793033  69.786942
5  2006-11-09    F  NaN    52  0.636655  33.106077
6  2006-10-22    P  NaN    66  0.581946  38.408408

Upvotes: 2

Related Questions