Reputation: 529
I am trying to find out how to check if there is any row within my dataframe which has every item reflected in a dictionary. The dictionary may have more items which are not expressed in the dataframe rows.
For example,
import pandas as pd
validCombinations = { 'combo1':{ 'prop1': 'X', 'prop2':'Y', 'prop3': 'Z' },
'combo2':{ 'prop1': 'U', 'prop3': 'Z' },
'combo3':{ 'prop1': 'X', 'prop2':'X' },
'combo4':{ 'prop1': 'X', 'prop2':'Y', 'prop3': 'U' }
}
df = pd.DataFrame.from_dict( { k: validCombinations[k]
for k in validCombinations.keys() },
orient='index')
print df
>>
prop1 prop2 prop3
combo1 X Y Z
combo2 U NaN Z
combo3 X X NaN
combo4 X Y U
mydict = { 'prop1': 'X', 'prop2':'Y', 'prop3': 'Z', 'prop4': 'V' }
I now want a function that returns True or False depending on whether a complete row from df was expressed by mydict,
dfRowMatchedMyDict( df, mydict)
--> this should return True since combo1 in df has all items expressed in mydict
I am really not sure if I should be converting mydict to a dataframe first, or how to go about solving this in a clean way?
Upvotes: 2
Views: 53
Reputation: 402854
Option 1
df.isin
+ df.all
-
df[df.isin(list(mydict.values())).all(1)]
prop1 prop2 prop3
combo1 X Y Z
Option 2
Using piR's hack with assign
, filter on valid columns only and test with eq
-
df[df.eq(df.assign(**mydict)[df.columns]).all(1)]
prop1 prop2 prop3
combo1 X Y Z
Upvotes: 2
Reputation: 210912
Try this:
In [116]: qry = ' and '.join(['{} == "{}"'.format(k,v) for k,v in mydict.items() if k in df.columns.tolist()])
In [117]: df.query(qry)
Out[117]:
prop1 prop2 prop3
combo1 X Y Z
or:
In [121]: df.eval(qry)
Out[121]:
combo1 True
combo2 False
combo3 False
combo4 False
dtype: bool
Details:
In [119]: qry
Out[119]: 'prop1 == "X" and prop2 == "Y" and prop3 == "Z"'
UPDATE:
if your dict would contain only valid columns we could do the following:
In [137]: mydict2 = { 'prop1': 'X', 'prop2':'Y', 'prop3': 'Z'}
In [138]: df.eq(pd.Series(mydict2), axis=1)
Out[138]:
prop1 prop2 prop3
combo1 True True True
combo2 False False True
combo3 True False False
combo4 False True False
In [139]: df.eq(pd.Series(mydict2), axis=1).all(1)
Out[139]:
combo1 True
combo2 False
combo3 False
combo4 False
dtype: bool
or using original dictionary:
In [141]: flt = pd.Series(mydict).loc[df.columns]
In [142]: flt
Out[142]:
prop1 X
prop2 Y
prop3 Z
dtype: object
In [143]: df.eq(flt, axis=1).all(1)
Out[143]:
combo1 True
combo2 False
combo3 False
combo4 False
dtype: bool
Upvotes: 2
Reputation: 420
Another option:
def dfRowMatchedMyDict(df,mydict):
cols = set(mydict) & set(df.columns)
trues = np.zeros(len(df))
for column in cols:
trues += df[column] == mydict[column]
return trues == len(cols)
Out
dfRowMatchedMyDict(df,mydict)
combo1 True
combo2 False
combo3 False
combo4 False
Name: prop1, dtype: bool
Upvotes: 0