Reputation: 93
I'd like to create filter expressions using all items in a list so that I can query a pandas dataframe.
Please use the example code below:
col = ['A','B']
data = [[0,5],[3,2],[1,9],[2,7]]
df = pd.DataFrame(data, columns=col)
fltr = [('A', 1),
('A', 1, 'B', 5)]
I'd like to rewrite the code below in a loop so that I don't need to write every possible combination myself:
f = ['{0}>={1}'.format(fltr[0][0], fltr[0][1])]
df.query(f)
f = ['{0}<={1}'.format(fltr[0][0], fltr[0][1])]
df.query(f)
f = ['{0}>={1} and {2}>={3}'.format(fltr[1][0], fltr[1][1], fltr[1][2], fltr[1][3])]
df.query(f)
...
Also note that I'd like to query every combination of '>=' and '<=' for each expression
Desired expressions:
A>=1, A<=1, A>=1 and B>=5, A>=1 and B<=5, A<=1 and B>=5, A<=1 and B<=5
Upvotes: 1
Views: 353
Reputation: 36643
You are going to need several parts to get the result you want. You need a function to split your tuples into series of 2-tuples. The product
function from the itertools
package can handle creating the combinations of column, operation, and value. You can then groupby
the column, and use product
again to get all of the filter combinations. After that you can merge the strings using .join
and pass the filters to df.eval
or df.query
.
import pandas as pd
from itertools import product, groupby
ge, le = '>=', '<='
def split_tuple(tup):
return [tup[i:i+2] for i in range(0, len(tup), 2)]
def create_basic_filters(tup):
f_gen = product(split_tuple(tup), (ge, le))
return [f'{col}{op}{v}' for (col, v), op in f_gen]
def create_filter_groups(basic_filters):
gb = groupby(basic_filters, key=lambda x: x[:x.find('=')-1])
groupings = [tuple(g) for _, g in gb]
return [' & '.join(f) for f in product(*groupings)]
# test the filter creation
fltr = [('A', 1), ('A', 2, 'B', 3), ('A', 1, 'B', 5, 'C', 7),
('A', 0, 'B', 2, 'C', 10, 'D', 15)]
basic_filters = create_basic_filters(fltr[2])
filter_strs = create_filter_groups(basic_filters)
filter_strs
# returns:
['A>=1 & B>=5 & C>=7',
'A>=1 & B>=5 & C<=7',
'A>=1 & B<=5 & C>=7',
'A>=1 & B<=5 & C<=7',
'A<=1 & B>=5 & C>=7',
'A<=1 & B>=5 & C<=7',
'A<=1 & B<=5 & C>=7',
'A<=1 & B<=5 & C<=7']
To actually use the filter we can pass them to df.eval
to get a boolean Series, or to df.query
if you want to also slice the data frame. Below we print the resulting slice of each data frame using a for-loop.
df = pd.DataFrame({'A': [0, 3, 1, 2],
'B': [5, 2, 9, 5],
'C': [3, 7, 11, 5],
'D': [12, 21, 13, 19]})
for f in filter_strs:
print('='*30)
print('Filter =', f)
print(df.query(f))
# prints:
==============================
Filter = A>=1 & B>=5 & C>=7
A B C D
2 1 9 11 13
==============================
Filter = A>=1 & B>=5 & C<=7
A B C D
3 2 5 5 19
==============================
Filter = A>=1 & B<=5 & C>=7
A B C D
1 3 2 7 21
==============================
Filter = A>=1 & B<=5 & C<=7
A B C D
1 3 2 7 21
3 2 5 5 19
==============================
Filter = A<=1 & B>=5 & C>=7
A B C D
2 1 9 11 13
==============================
Filter = A<=1 & B>=5 & C<=7
A B C D
0 0 5 3 12
==============================
Filter = A<=1 & B<=5 & C>=7
Empty DataFrame
Columns: [A, B, C, D]
Index: []
==============================
Filter = A<=1 & B<=5 & C<=7
A B C D
0 0 5 3 12
Upvotes: 1