Reputation: 4701
I've been able to filter Pandas dataframe rows containing a number:
import pandas as pd
df = pd.DataFrame({'c1': [3, 1, 2], 'c2': [3, 3, 3], 'c3': [2, 5, None], 'c4': [1, 2, 3]})
c1 c2 c3 c4
0 3 3 2.0 1
1 1 3 5.0 2
2 2 3 NaN 3
df1 = df[(df.values == 1)]
c1 c2 c3 c4
0 3 3 2.0 1
1 1 3 5.0 2
But if I try to filter based excluding a number, I get a really strange result with repeated rows:
df1 = df[(df.values != 1)]
c1 c2 c3 c4
0 3 3 2.0 1
0 3 3 2.0 1
0 3 3 2.0 1
1 1 3 5.0 2
1 1 3 5.0 2
1 1 3 5.0 2
2 2 3 NaN 3
2 2 3 NaN 3
2 2 3 NaN 3
2 2 3 NaN 3
Why is that? And how can I filter only the rows that don't contain the specified number?
Thanks in advance!
Upvotes: 6
Views: 12324
Reputation: 2492
Basically, use the filter to create an index, reverse the index, and then select the rows based off that index.
import pandas as pd
df = pd.DataFrame({'c1': [3, 1, 2, 1, 3],
'c2': [3, 3, 3, 2, 3],
'c3': [2, 5, None,3, 3],
'c4': [1, 2, 3, 1, 3]})
print(df)
# Create an index based on any row containing 1
index = df.values == 1
print(index)
# This reverses the index.
# I.e.
#[False False False True] will equal False, since True is in the list
#[True False False False] will equal False, since True is in the list
#[False False False False] will equal True, since True is NOT in the list
index = [True if True not in l else False for l in index]
# Pick out only the rows where the index is true
df1 = df[index]
print(df1)
Upvotes: 2
Reputation: 25269
Look at this mask
In [88]: df.values != 1
Out[88]:
array([[ True, True, True, False],
[False, True, True, True],
[ True, True, True, True]])
Slicing base on numpy convention. Every True
will be repeated, so you have repeated rows in the output. You need additional all
to check each row on all True
and return a single True/False for each row.
df[(df.values != 1).all(1)]
Out[87]:
c1 c2 c3 c4
2 2 3 NaN 3
Note: my intention is reusing your code, so I didn't change it. While the concise code should be.
df[(df != 1).all(1)]
or
df[df.ne(1).all(1)]
Upvotes: 10
Reputation: 1126
try this:
indexes = [x for x in range(len(list(df.values))) if 1 not in df.values[x]] # get indexes where 1 not appear
df.iloc[indexes]
output:
c1 c2 c3 c4
2 2 3 NaN 3
Upvotes: 2
Reputation: 30930
Use DataFrame.any:
df[~df.eq(1).any(axis=1)]
Output:
c1 c2 c3 c4
2 2 3 NaN 3
Upvotes: 6