Shuvayan Das
Shuvayan Das

Reputation: 1048

How to subset a pandas data-frame based on a dictionary using case-insensitive matching

I have a dataframe which contains various products and their descriptions as shown in the image below:

enter image description here

I have a dict which contains the key-value pairs based on which the filtering has to be done:

ent_dict

{'brand': 'Dexter', 'color': 'brown', 'product': 'footwear', 'size': '32'}

As can be seen the dict and dataframe might contain values in different cases and hence I need to do case-insensitive matching here. Also there might be columns that are numeric for which normal matching will do. So can someone please help me in this.

Upvotes: 1

Views: 645

Answers (1)

praba230890
praba230890

Reputation: 2310

The above works for string matches. You can further change the final statement to match the integers too.

import numpy as np
import pandas as pd
import re

df = pd.DataFrame({'Product': np.array(['Footwear' for i in range(4)]), 'Category': np.array(['Women' for i in range(4)]), 'Size': np.array([7, 7, 7, 8]), 'Color': np.array(['black', 'brown', 'blue', 'black'])})

ent_dict = {'Category': 'Women', 'Color': 'black', 'Product': 'Footwear'}
values = [i for i in ent_dict.values()]
columns = [df.filter(regex=re.compile(i, re.IGNORECASE)).columns[0] for i in ent_dict]

df[eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond)) 
            for col, cond in zip(columns, values)]))]

The case insensitive search can be accomplished using str.contains of DataFrame object.

df[eval(" & ".join(["(df['{0}'].str.contains({1}, case=False))".format(col, repr(cond)) 
            for col, cond in zip(columns, values)]))]

Upvotes: 2

Related Questions