gbox
gbox

Reputation: 829

Iteration with multiple conditions in Pandas

In part of my code, I am searching for subsets of a DataFrame in order to manipulate them later. Part of the code that takes a very long time goes as follow:

for record in records.itertuples():
    matches_ids = df[((df['column_1'] < record.attribute_1) &
                               (record.attribute_2 < df['column_2']) &
                               (df['column_3'] < record.attribute_3) &
                               (record.attribute_4 == df['column_4']) &
                               (df['column_5'] != 'value'))].index

Is there a way to reduce the complexity of the code?

expected output: list of indices that answer all conditions

p.s removing conditions reduce runtime of almost 10-fold for each condition

Upvotes: 0

Views: 59

Answers (1)

Corralien
Corralien

Reputation: 120509

You can perform a cross merge then filter out your dataframe with query:

qs = "(column_1 < attribute_1) \
        & (attribute_2 < column_2) \
        & (column_3 < attribute_3) \
        & (attribute_4 == column_4) \
        & (column_5 != value)"

df.merge(df2, how='cross').query(qs)

Upvotes: 1

Related Questions