Reputation: 1048
I have a dataframe which contains various products and their descriptions as shown in the image below:
I have a dict which contains the key-value pairs based on which the filtering has to be done:
ent_dict
{'brand': 'Dexter', 'color': 'brown', 'product': 'footwear', 'size': '32'}
As can be seen the dict and dataframe might contain values in different cases and hence I need to do case-insensitive matching here. Also there might be columns that are numeric for which normal matching will do. So can someone please help me in this.
Upvotes: 1
Views: 645
Reputation: 2310
The above works for string matches. You can further change the final statement to match the integers too.
import numpy as np
import pandas as pd
import re
df = pd.DataFrame({'Product': np.array(['Footwear' for i in range(4)]), 'Category': np.array(['Women' for i in range(4)]), 'Size': np.array([7, 7, 7, 8]), 'Color': np.array(['black', 'brown', 'blue', 'black'])})
ent_dict = {'Category': 'Women', 'Color': 'black', 'Product': 'Footwear'}
values = [i for i in ent_dict.values()]
columns = [df.filter(regex=re.compile(i, re.IGNORECASE)).columns[0] for i in ent_dict]
df[eval(" & ".join(["(df['{0}'] == {1})".format(col, repr(cond))
for col, cond in zip(columns, values)]))]
The case insensitive search can be accomplished using str.contains
of DataFrame
object.
df[eval(" & ".join(["(df['{0}'].str.contains({1}, case=False))".format(col, repr(cond))
for col, cond in zip(columns, values)]))]
Upvotes: 2