Reputation: 1731
This is very much related to Select rows from a DataFrame based on list values in a column in pandas, where instead of looking for just ONE column, I want to look for matching tuples in two columns.
Reproducible example
import pandas as pd
df = pd.DataFrame([[1, 2, 'a'], [1, 3, 'b'], [1, 3, 'c'], [1, 4, 'b'], [1, 2, 'c'], [1, 7, 'c']])
df.columns = ['id1', 'id2', 'name']
print(df)
Sample DataFrame
id1 id2 name
0 1 2 a
1 1 3 b
2 1 3 b
3 1 4 b
4 1 2 c
5 1 7 c
I have a list of tuples!
badTuples = [(1, 2), (1, 3), (1, 5)]
In the first element of the tuples, (1, 2)
; 1
is used to refer to the value in'id1'
column, and 2
is used to refer to the value in'id2'
column.
I want to remove such rows, which have a matching ('id1', 'id2')
tuple elements! It is sometimes the case that in the list of tuples
there are pairs (1, 5)
which ARE NOT PRESENT in the DataFrame! In my example, we have only (1, 2), (1, 3) and (1, 4)
as id1, id2
pairs in the DataFrame!
In the case, I want that pair (1, 5)
to be skipped.
The output DataFrame should be:
id1 id2 name
3 1 4 b
5 1 7 c
Upvotes: 1
Views: 2159
Reputation: 863291
Use:
df = df[~pd.Series(list(zip(df['id1'], df['id2']))).isin(badTuples)]
Or:
df = df[~df[['id1','id2']].apply(tuple, 1).isin(badTuples)]
print(df)
id1 id2 name
3 1 4 b
5 1 7 c
Upvotes: 3