Reputation: 437
sample filtering condition:-
x y z
1 2 1
1 3 2
1 2 5
1 3 1
now i want to filter the above specified condition from the given data. for that i need a generic function, i.e, that function should be work for any filters not only for the above specified filters.
I know how to filter data manually in python for more than one condition.
I think generic function may be needed two arguments one is data and another one is filtering condition.
But I am unable to found the logic for write the generic function to filter the data.
Kindly anyone can help me to tackle.
Thanks in advance.
Upvotes: 2
Views: 2096
Reputation: 437
def filter_function(df,filter_df):
lvl_=list()
lvl=list()
vlv=list()
df1=pd.DataFrame()
n=filter_df.apply(lambda x: x.tolist(), axis=1)
for i in range(0,len(n)):
for j in range(0,len(n[i])):
if i==0:
lvl_.append(n[i][j].split('==')[0])
lvl.append(n[i][j].split('==')[1])
if len(lvl)==len(n[i]):
vlv.append(lvl)
lvl=list()
final_df=df[lvl_]
for k in range(0,len(vlv)):
df1=df1.append(final_df[final_df.isin(vlv[k])].dropna())
return(df1)
filter_function(df,filter_df)
Upvotes: 1
Reputation: 863301
You can create list of conditions
and then np.logical_and.reduce
:
x1 = df.x==1
y2 = df.y==2
z1 = df.z==1
y3 = df.y==3
m1 = np.logical_and.reduce([x1, y2, z1])
m2 = np.logical_and.reduce([x1, y3, z1])
Or concat
all mask tohether and check all True
s per row by DataFrame.all
:
m1 = pd.concat([x1, y2, z1], axis=1).all(axis=1)
m2 = pd.concat([x1, y3, z1], axis=1).all(axis=1)
EDIT:
If possible define column names with values for filtering in dictionary:
d1 = {'x':1, 'y':2, 'z':1}
d2 = {'x':1, 'y':3, 'z':1}
m1 = np.logical_and.reduce([df[k] == v for k, v in d1.items()])
m2 = np.logical_and.reduce([df[k] == v for k, v in d2.items()])
Another approach with merge
by one row DataFrame created from dictionary:
df1 = pd.DataFrame([d1]).merge(df)
EDIT:
For general solution is possible parse each value of file to tuples and use operators:
df1 = pd.DataFrame({0: ['x==1', 'x==1'], 1: ['y==2', 'y<=3'], 2: ['z!=1', 'z>1']})
print (df1)
0 1 2
0 x==1 y==2 z!=1
1 x==1 y<=3 z>1
import operator, re
ops = {'>': operator.gt,
'<': operator.lt,
'>=': operator.ge,
'<=': operator.le,
'==': operator.eq,
'!=': operator.ne}
#if numeric, parse to float, else not touch ()e.g. if string
def try_num(x):
try:
return float(x)
except ValueError:
return x
L = df1.to_dict('r')
#https://stackoverflow.com/q/52620865/2901002
rgx = re.compile(r'([<>=!]+)')
parsed = [[rgx.split(v) for v in d.values()] for d in L]
L = [[(x, op, try_num(y)) for x,op,y in ps] for ps in parsed]
print (L)
[[('x', '==', 1.0), ('y', '==', 2.0), ('z', '!=', 1.0)],
[('x', '==', 1.0), ('y', '<=', 3.0), ('z', '>', 1.0)]]
And now filter by first value of list - first row of file:
m = np.logical_and.reduce([ops[j](df[i], k) for i, j, k in L[0]])
print (m)
[False False True False]
Upvotes: 2
Reputation: 164783
Since you have a single numeric dtype
, you can use the underlying NumPy array:
res = df[(df.values == [1, 2, 1]).all(1)]
print(res)
x y z
0 1 2 1
For a generic function with list
input:
def filter_df(df, L):
return df[(df.values == L).all(1)]
res = filter_df(df, [1, 2, 1])
If you need a dictionary input:
def filter_df(df, d):
L = list(map(d.get, df))
return df[(df.values == L).all(1)]
res = filter_df(df, {'x': 1, 'y': 2, 'z': 1})
Upvotes: 1