Ali
Ali

Reputation: 93

Python iteratively create filter expression using items in list

I'd like to create filter expressions using all items in a list so that I can query a pandas dataframe.

Please use the example code below:

col = ['A','B']
data = [[0,5],[3,2],[1,9],[2,7]]
df = pd.DataFrame(data, columns=col)

fltr = [('A', 1),
 ('A', 1, 'B', 5)]

I'd like to rewrite the code below in a loop so that I don't need to write every possible combination myself:

f = ['{0}>={1}'.format(fltr[0][0], fltr[0][1])]
df.query(f)
f = ['{0}<={1}'.format(fltr[0][0], fltr[0][1])]
df.query(f)
f = ['{0}>={1} and {2}>={3}'.format(fltr[1][0], fltr[1][1], fltr[1][2], fltr[1][3])]
df.query(f)
...

Also note that I'd like to query every combination of '>=' and '<=' for each expression

Desired expressions:

A>=1, A<=1, A>=1 and B>=5, A>=1 and B<=5, A<=1 and B>=5, A<=1 and B<=5

Upvotes: 1

Views: 353

Answers (1)

James
James

Reputation: 36643

You are going to need several parts to get the result you want. You need a function to split your tuples into series of 2-tuples. The product function from the itertools package can handle creating the combinations of column, operation, and value. You can then groupby the column, and use product again to get all of the filter combinations. After that you can merge the strings using .join and pass the filters to df.eval or df.query.

import pandas as pd
from itertools import product, groupby

ge, le = '>=', '<='

def split_tuple(tup):
    return [tup[i:i+2] for i in range(0, len(tup), 2)]

def create_basic_filters(tup):
    f_gen = product(split_tuple(tup), (ge, le))
    return [f'{col}{op}{v}' for (col, v), op in f_gen]

def create_filter_groups(basic_filters):
    gb = groupby(basic_filters, key=lambda x: x[:x.find('=')-1])
    groupings = [tuple(g) for _, g in gb]
    return [' & '.join(f) for f in product(*groupings)]


# test the filter creation
fltr = [('A', 1), ('A', 2, 'B', 3), ('A', 1, 'B', 5, 'C', 7), 
        ('A', 0, 'B', 2, 'C', 10, 'D', 15)]

basic_filters = create_basic_filters(fltr[2])
filter_strs = create_filter_groups(basic_filters)
filter_strs
# returns:
['A>=1 & B>=5 & C>=7',
 'A>=1 & B>=5 & C<=7',
 'A>=1 & B<=5 & C>=7',
 'A>=1 & B<=5 & C<=7',
 'A<=1 & B>=5 & C>=7',
 'A<=1 & B>=5 & C<=7',
 'A<=1 & B<=5 & C>=7',
 'A<=1 & B<=5 & C<=7']

To actually use the filter we can pass them to df.eval to get a boolean Series, or to df.query if you want to also slice the data frame. Below we print the resulting slice of each data frame using a for-loop.

df = pd.DataFrame({'A': [0, 3, 1, 2],
                   'B': [5, 2, 9, 5],
                   'C': [3, 7, 11, 5],
                   'D': [12, 21, 13, 19]})

for f in filter_strs:
    print('='*30)
    print('Filter =', f)
    print(df.query(f))

# prints:
==============================
Filter = A>=1 & B>=5 & C>=7
   A  B   C   D
2  1  9  11  13
==============================
Filter = A>=1 & B>=5 & C<=7
   A  B  C   D
3  2  5  5  19
==============================
Filter = A>=1 & B<=5 & C>=7
   A  B  C   D
1  3  2  7  21
==============================
Filter = A>=1 & B<=5 & C<=7
   A  B  C   D
1  3  2  7  21
3  2  5  5  19
==============================
Filter = A<=1 & B>=5 & C>=7
   A  B   C   D
2  1  9  11  13
==============================
Filter = A<=1 & B>=5 & C<=7
   A  B  C   D
0  0  5  3  12
==============================
Filter = A<=1 & B<=5 & C>=7
Empty DataFrame
Columns: [A, B, C, D]
Index: []
==============================
Filter = A<=1 & B<=5 & C<=7
   A  B  C   D
0  0  5  3  12

Upvotes: 1

Related Questions