Reputation: 10150
Lets say I have a dataframe like this:
id num
0 1 1
1 2 2
2 3 1
3 4 2
4 1 1
5 2 2
6 3 1
7 4 2
The above can be generated with this for testing purposes:
test = pd.DataFrame({'id': np.array([1,2,3,4] * 2,dtype='int32'),
'num': np.array([1,2] * 4,dtype='int32')
})
Now, I want to keep only the rows where a certain condition is met: id
is not 1 AND num
is not 1. Essentially I want to remove the rows with index 0 and 4. For my actual dataset its easier to remove the rows I dont want rather than to specify the rows that I do want
I have tried this:
test = test[(test['id'] != 1) & (test['num'] != 1)]
However, that gives me this:
id num
1 2 2
3 4 2
5 2 2
7 4 2
It seems to have removed all rows where id
is 1 OR num
is 1
I've seen a number of other questions where the answer is the one I used above but it doesn't seem to be working out in my case
Upvotes: 6
Views: 12568
Reputation: 394041
If you change the boolean condition to be equality and invert the combined boolean conditions by enclosing both in additional parentheses then you get the desired behaviour:
In [14]:
test = test[~((test['id'] == 1) & (test['num'] == 1))]
test
Out[14]:
id num
1 2 2
2 3 1
3 4 2
5 2 2
6 3 1
7 4 2
I also think your understanding of boolean syntax is incorrect what you want is to or
the conditions:
In [22]:
test = test[(test['id'] != 1) | (test['num'] != 1)]
test
Out[22]:
id num
1 2 2
2 3 1
3 4 2
5 2 2
6 3 1
7 4 2
If you think about what this means the first condition excludes any row where 'id' is equal to 1 and similarly for the 'num' column:
In [24]:
test[test['id'] != 1]
Out[24]:
id num
1 2 2
2 3 1
3 4 2
5 2 2
6 3 1
7 4 2
In [25]:
test[test['num'] != 1]
Out[25]:
id num
1 2 2
3 4 2
5 2 2
7 4 2
So really you wanted to or
(|
) the above conditions
Upvotes: 13