Pavel Soroka
Pavel Soroka

Reputation: 51

Filter Pandas Dataframe columns with a complex condition generated by a predicate function (defined on columns)

I would like to filter out pandas dataframe columns with a condition defined on its columns with a predicate function, for example (generally it may be much more sophisticated with rather complex dependencies between different elements of the series):

def detect_jumps(data, jump_factor=5):
    for i in range(1, len(data)):
        if data[i] - data[i - 1] > jump_factor:
            return True
    return False

on a dataframe df:

import pandas as pd
data = [
    {'A': '10', 'B': '10', 'C': '100', 'D': '100', 'E': '0', },
    {'A': '15', 'B': '16', 'C': '105', 'D': '104', 'E': '10', },
    {'A': '20', 'B': '20', 'C': '110', 'D': '110', 'E': '11', },
]
df = pd.DataFrame(data)

i.e.

    A   B   C   D   E
0   10  10  100 100 0
1   15  16  105 104 10
2   20  20  110 110 11

It should only filter out columns B (col[1] - col[0] == 6 > 5) and D (col[2] - col[1] == 6 > 5)

or predicate detect_jumps(data, 9) and in this case it should only filter out column E (col[1] - col[0] == 10 > 9)

Are there any ways to use such functions as a condition for filtering?

Upvotes: 0

Views: 473

Answers (1)

mozway
mozway

Reputation: 261820

You don't need a custom function, use vectorial operations:

df2 = df.loc[:, ~df.astype(int).diff().gt(5).any()]

output:

    A    C
0  10  100
1  15  105
2  20  110

Nevertheless, using your function:

df2 = df.loc[:, [not detect_jumps(c) for label, c in df.astype(int).items()]]

# OR
df2 = df[[label for label, c in df.astype(int).items() if not detect_jumps(c)]]

Upvotes: 2

Related Questions