Gunners4Ever
Gunners4Ever

Reputation: 65

Using Pandas to subset data from a dataframe based on multiple columns?

I am new to python. I have to extract a subset from pandas dataframe based on 2 lists corresponding to 2 columns in that dataframe. Both the values in list should match with that of dataframe at index level. I have tried with "isin" function but obviously it doesn't work with combinations.

from pandas import *

d = {'A' : ['a', 'a', 'c', 'a','b'] ,'B' : [1, 2, 1, 4,1]}

df = DataFrame(d)
list1 = ['a','b']
list2 = [1,2]

print df

   A  B
0  a  1
1  a  2
2  c  1
3  a  4
4  b  1

### Using isin function
df[(df.A.isin(list1)) & (df.B.isin(list2)) ]
   A  B
0  a  1
1  a  2
4  b  1

###Desired outcome
d2 = {'A' : ['a'], 'B':[1]}
DataFrame(d2)

   A  B
0  a  1

Please let me know if this can be done without using loops and if there is a way to do it in a single step.

Upvotes: 1

Views: 140

Answers (1)

Silenced Temporarily
Silenced Temporarily

Reputation: 1004

A quick and dirty way to do this is using zip:

df['C'] = zip(df['A'], df['B'])
list3 = zip(list1, list2)

d2 = df[df['C'].isin(list3)

print(df2)
   A  B       C
0  a  1  (a, 1)

You can of course drop the newly created column after you're done filtering on it.

Upvotes: 1

Related Questions