Chins Kuriakose
Chins Kuriakose

Reputation: 149

Selecting rows from a pandas dataframe based on the values of some columns in a list of the same dataframe?

Let's suppose, there is a dataframe :

df1 = 
   A  B   C
0  1  a  a1
1  2  b  b2
2  3  c  c3
3  4  d  d4
4  5  e  e5
5  6  f  f6

Created as :

a1 = [1,2,3,4,5,6]
a2 = ['a','b','c','d','e','f']
a3 = ['a1','b2','c3','d4','e5','f6']
df1 = pd.DataFrame(list(zip(a1,a2,a3)),columns=["A","B","C"])

Here, I am considering Columns A and B to be something like primary keys for this dataframe. So, PK = ["A","B"].

I have another list, list1 = [[2,'b'],[5,'e']], which is a subset of the dataframe df[PK].

Is there any way I can get the rows corresponding to these primary key values inside the list from the dataframe df?

Something like : df1 = df[df[PK].values.isin(list1)] which doesn't work as I expect.

I would like to get an output df1 as :

df1 =
   A  B   C
1  2  b  b2
4  5  e  e5

There are some similar questions, which I have gone through in this portal. But none of them showed me how to select rows based on filter on multiple columns as mentioned above. Thanks in advance.

Upvotes: 0

Views: 37

Answers (1)

Red
Red

Reputation: 27547

Here is how you can use pandas.DataFrame.merge():

import pandas as pd
a1 = [1,2,3,4,5,6]
a2 = ['a','b','c','d','e','f']
a3 = ['a1','b2','c3','d4','e5','f6']

df1 = pd.DataFrame(list(zip(a1,a2,a3)),columns=["A","B","C"])

PK = ["A","B"]

list1 = [[2,'b'],[5,'e']]

df2 = df1.merge(pd.DataFrame(list1,columns=PK),on=PK)

print(df2)

Output:

   A  B   C
0  2  b  b2
1  5  e  e5

Upvotes: 1

Related Questions