Filter Pandas Dataframe columns with a complex condition generated by a predicate function (defined on columns)

Question

I would like to filter out pandas dataframe columns with a condition defined on its columns with a predicate function, for example (generally it may be much more sophisticated with rather complex dependencies between different elements of the series):

def detect_jumps(data, jump_factor=5):
    for i in range(1, len(data)):
        if data[i] - data[i - 1] > jump_factor:
            return True
    return False

on a dataframe df:

import pandas as pd
data = [
    {'A': '10', 'B': '10', 'C': '100', 'D': '100', 'E': '0', },
    {'A': '15', 'B': '16', 'C': '105', 'D': '104', 'E': '10', },
    {'A': '20', 'B': '20', 'C': '110', 'D': '110', 'E': '11', },
]
df = pd.DataFrame(data)

i.e.

    A   B   C   D   E
0   10  10  100 100 0
1   15  16  105 104 10
2   20  20  110 110 11

It should only filter out columns B (col[1] - col[0] == 6 > 5) and D (col[2] - col[1] == 6 > 5)

or predicate detect_jumps(data, 9) and in this case it should only filter out column E (col[1] - col[0] == 10 > 9)

Are there any ways to use such functions as a condition for filtering?

Filter Pandas Dataframe columns with a complex condition generated by a predicate function (defined on columns)

Answers (1)

Related Questions