Reputation: 829
In part of my code, I am searching for subsets of a DataFrame in order to manipulate them later. Part of the code that takes a very long time goes as follow:
for record in records.itertuples():
matches_ids = df[((df['column_1'] < record.attribute_1) &
(record.attribute_2 < df['column_2']) &
(df['column_3'] < record.attribute_3) &
(record.attribute_4 == df['column_4']) &
(df['column_5'] != 'value'))].index
Is there a way to reduce the complexity of the code?
expected output: list of indices that answer all conditions
p.s removing conditions reduce runtime of almost 10-fold for each condition
Upvotes: 0
Views: 59
Reputation: 120509
You can perform a cross merge
then filter out your dataframe with query
:
qs = "(column_1 < attribute_1) \
& (attribute_2 < column_2) \
& (column_3 < attribute_3) \
& (attribute_4 == column_4) \
& (column_5 != value)"
df.merge(df2, how='cross').query(qs)
Upvotes: 1