Pandas: How to remove rows from a dataframe based on a list of tuples representing values in TWO columns?

Question

This is very much related to Select rows from a DataFrame based on list values in a column in pandas, where instead of looking for just ONE column, I want to look for matching tuples in two columns.

Reproducible example

import pandas as pd

df = pd.DataFrame([[1, 2, 'a'], [1, 3, 'b'], [1, 3, 'c'], [1, 4, 'b'], [1, 2, 'c'], [1, 7, 'c']])
df.columns = ['id1', 'id2', 'name']

print(df)

Sample DataFrame

    id1  id2 name
0    1    2    a
1    1    3    b
2    1    3    b
3    1    4    b
4    1    2    c
5    1    7    c

I have a list of tuples!

badTuples = [(1, 2), (1, 3), (1, 5)]

In the first element of the tuples, (1, 2); 1 is used to refer to the value in'id1' column, and 2 is used to refer to the value in'id2' column.

I want to remove such rows, which have a matching ('id1', 'id2') tuple elements! It is sometimes the case that in the list of tuples there are pairs (1, 5) which ARE NOT PRESENT in the DataFrame! In my example, we have only (1, 2), (1, 3) and (1, 4) as id1, id2 pairs in the DataFrame!

In the case, I want that pair (1, 5) to be skipped.

The output DataFrame should be:

    id1  id2 name
3    1    4    b
5    1    7    c

jezrael · Accepted Answer

Use:

df = df[~pd.Series(list(zip(df['id1'], df['id2']))).isin(badTuples)]

Or:

df = df[~df[['id1','id2']].apply(tuple, 1).isin(badTuples)]

print(df)
   id1  id2 name
3    1    4    b
5    1    7    c

Pandas: How to remove rows from a dataframe based on a list of tuples representing values in TWO columns?

Answers (1)

Related Questions