How to subset a pandas data-frame based on a dictionary using case-insensitive matching

Question

I have a dataframe which contains various products and their descriptions as shown in the image below:

I have a dict which contains the key-value pairs based on which the filtering has to be done:

ent_dict

{'brand': 'Dexter', 'color': 'brown', 'product': 'footwear', 'size': '32'}

As can be seen the dict and dataframe might contain values in different cases and hence I need to do case-insensitive matching here. Also there might be columns that are numeric for which normal matching will do. So can someone please help me in this.

praba230890 · Accepted Answer

The above works for string matches. You can further change the final statement to match the integers too.

import numpy as np
import pandas as pd
import re

df = pd.DataFrame({'Product': np.array(['Footwear' for i in range(4)]), 'Category': np.array(['Women' for i in range(4)]), 'Size': np.array([7, 7, 7, 8]), 'Color': np.array(['black', 'brown', 'blue', 'black'])})

ent_dict = {'Category': 'Women', 'Color': 'black', 'Product': 'Footwear'}
values = [i for i in ent_dict.values()]
columns = [df.filter(regex=re.compile(i, re.IGNORECASE)).columns[0] for i in ent_dict]

df[eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond)) 
            for col, cond in zip(columns, values)]))]

The case insensitive search can be accomplished using str.contains of DataFrame object.

df[eval(" & ".join(["(df['{0}'].str.contains({1}, case=False))".format(col, repr(cond)) 
            for col, cond in zip(columns, values)]))]

How to subset a pandas data-frame based on a dictionary using case-insensitive matching

Answers (1)

Related Questions