Dribbler
Dribbler

Reputation: 4701

How to select rows from a dataframe were any value is not equal to a specified value

I've been able to filter Pandas dataframe rows containing a number:

import pandas as pd    

df = pd.DataFrame({'c1': [3, 1, 2], 'c2': [3, 3, 3], 'c3': [2, 5, None], 'c4': [1, 2, 3]})

   c1  c2   c3  c4
0   3   3  2.0   1
1   1   3  5.0   2
2   2   3  NaN   3

df1 = df[(df.values == 1)]

   c1  c2   c3  c4
0   3   3  2.0   1
1   1   3  5.0   2

But if I try to filter based excluding a number, I get a really strange result with repeated rows:

df1 = df[(df.values != 1)]

   c1  c2   c3  c4
0   3   3  2.0   1
0   3   3  2.0   1
0   3   3  2.0   1
1   1   3  5.0   2
1   1   3  5.0   2
1   1   3  5.0   2
2   2   3  NaN   3
2   2   3  NaN   3
2   2   3  NaN   3
2   2   3  NaN   3

Why is that? And how can I filter only the rows that don't contain the specified number?

Thanks in advance!

Upvotes: 6

Views: 12324

Answers (4)

RightmireM
RightmireM

Reputation: 2492

Basically, use the filter to create an index, reverse the index, and then select the rows based off that index.

import pandas as pd    

df = pd.DataFrame({'c1': [3, 1, 2,   1, 3], 
                   'c2': [3, 3, 3,   2, 3], 
                   'c3': [2, 5, None,3, 3], 
                   'c4': [1, 2, 3,   1, 3]})
print(df)
# Create an index based on any row containing 1
index = df.values == 1
print(index)
# This reverses the index. 
# I.e. 
#[False False False  True] will equal False, since True is in the list
#[True False False False]  will equal False, since True is in the list
#[False False False False] will equal True,  since True is NOT in the list
index = [True if True not in l else False for l in index]
# Pick out only the rows where the index is true
df1 = df[index]
print(df1)

Upvotes: 2

Andy L.
Andy L.

Reputation: 25269

Look at this mask

In [88]: df.values != 1
Out[88]:
array([[ True,  True,  True, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]])

Slicing base on numpy convention. Every True will be repeated, so you have repeated rows in the output. You need additional all to check each row on all True and return a single True/False for each row.

df[(df.values != 1).all(1)]

Out[87]:
   c1  c2  c3  c4
2   2   3 NaN   3

Note: my intention is reusing your code, so I didn't change it. While the concise code should be.

df[(df != 1).all(1)]

or

df[df.ne(1).all(1)]

Upvotes: 10

Alex
Alex

Reputation: 1126

try this:

indexes = [x for x in range(len(list(df.values))) if 1 not in df.values[x]] # get indexes where 1 not appear
df.iloc[indexes]

output:

    c1  c2  c3  c4
2   2   3   NaN 3

Upvotes: 2

ansev
ansev

Reputation: 30930

Use DataFrame.any:

df[~df.eq(1).any(axis=1)]

Output:

   c1  c2  c3  c4
2   2   3 NaN   3

Upvotes: 6

Related Questions