V.Nouri
V.Nouri

Reputation: 277

How to filter pandas dataframe rows based on dictionary keys and values?

I have a dataframe and a dictionary in Python as shown below and I need to filter the dataframe based on the dictionary. As you see, the keys and values of the dictionary are two columns of the dataframe. I want to have a subset of dataframe which contains the keys and values of dictionary plus other columns.

df :

Customer_ID Category Type Delivery
40275 Book Buy True
40275 Software Sell False
40275 Video Game Sell False
40275 Cell Phone Sell False
39900 CD/DVD Sell True
39900 Book Buy True
39900 Software Sell True
35886 Cell Phone Sell False
35886 Video Game Buy False
35886 CD/DVD Sell False
35886 Software Sell False
40350 Software Sell True
28129 Software Buy False

And dictionary is:

d = {
 40275: ['Book','Software'],
 39900: ['Book'],
 35886: ['Software'],
 40350: ['Software'],
 28129: ['Software']
 }

And I need the following dataframe:

Customer_ID Category Type Delivery
40275 Book Buy True
40275 Software Sell False
39900 Book Buy True
35886 Software Sell False
40350 Software Sell True
28129 Software Buy False

Upvotes: 2

Views: 3963

Answers (3)

Henry Ecker
Henry Ecker

Reputation: 35626

We can set_index to the Customer_ID and Category columns then build a list of tuples from the dictionary d and reindex the DataFrame to include only the rows which match the list of tuples, then reset_index to restore the columns:

new_df = df.set_index(['Customer_ID', 'Category']).reindex(
    [(k, v) for k, lst in d.items() for v in lst]
).reset_index()

new_df:

   Customer_ID  Category  Type  Delivery
0        40275      Book   Buy      True
1        40275  Software  Sell     False
2        39900      Book   Buy      True
3        35886  Software  Sell     False
4        40350  Software  Sell      True
5        28129  Software   Buy     False

*Note this only works if the MultiIndex is unique (like the shown example). It will also add rows if the dictionary does not represent a subset of the DataFrame's MultiIndex (which may or may not be the desired behaviour).


Setup:

import pandas as pd

d = {
    40275: ['Book', 'Software'],
    39900: ['Book'],
    35886: ['Software'],
    40350: ['Software'],
    28129: ['Software']
}

df = pd.DataFrame({
    'Customer_ID': [40275, 40275, 40275, 40275, 39900, 39900, 39900, 35886,
                    35886, 35886, 35886, 40350, 28129],
    'Category': ['Book', 'Software', 'Video Game', 'Cell Phone', 'CD/DVD',
                 'Book', 'Software', 'Cell Phone', 'Video Game', 'CD/DVD',
                 'Software', 'Software', 'Software'],
    'Type': ['Buy', 'Sell', 'Sell', 'Sell', 'Sell', 'Buy', 'Sell', 'Sell',
             'Buy', 'Sell', 'Sell', 'Sell', 'Buy'],
    'Delivery': [True, False, False, False, True, True, True, False, False,
                 False, False, True, False]
})

Upvotes: 2

Shubham Sharma
Shubham Sharma

Reputation: 71689

Flatten the dictionary and create a new dataframe, then inner merge df with the new dataframe

df.merge(pd.DataFrame([{'Customer_ID': k, 'Category': i} 
                       for k, v in d.items() for i in v]))

   Customer_ID  Category  Type  Delivery
0        40275      Book   Buy      True
1        40275  Software  Sell     False
2        39900      Book   Buy      True
3        35886  Software  Sell     False
4        40350  Software  Sell      True
5        28129  Software   Buy     False

Upvotes: 6

Mayank Porwal
Mayank Porwal

Reputation: 34046

You can use df.merge with df.append:

In [444]: df1 = pd.DataFrame.from_dict(d, orient='index', columns=['Cat1', 'Cat2']).reset_index()

In [449]: res = df.merge(df1[['index', 'Cat1']], left_on=['Customer_ID', 'Category'], right_on=['index', 'Cat1']).drop(['index', 'Cat1'], 1)

In [462]: res = res.append(df.merge(df1[['index', 'Cat2']], left_on=['Customer_ID', 'Category'], right_on=['index', 'Cat2']).drop(['index', 'Cat2'], 1)).sort_values('Customer_ID', ascending=False)

In [463]: res
Out[463]: 
   Customer_ID  Category  Type  Delivery
3        40350  Software  Sell      True
0        40275      Book   Buy      True
0        40275  Software  Sell     False
1        39900      Book   Buy      True
2        35886  Software  Sell     False
4        28129  Software   Buy     False

Upvotes: 1

Related Questions