Pandas DF referencing the same slice twice in the same computation

Question

I have a huge data set to process and I am trying to optimize the most costly line, processing wise.

I use a df with 3 columns, A, B and C. I have 2 values, a and b, which are used to update the value of C in a subset of the df.

Before I continue, let me define a textual substitution to increase readability:

filter(_X) -> df.loc[df['A'] < a, _X]

Every time I type "filter", please substitute it with the text on the right (applying the correct argument in place of the parameter _X - think C/C++ macros). The line of code in question is:

filter('C') += a * np.minimum(filter('B'), b)

What I'm not sure about is if python will process "filter" twice when evaluating the expression, or if it will use a "reference" (a-la C++) and only do it once. In the former case, is there a way for me to rewrite the expression in a way to avoid the double execution of the code of "filter"?

Moreover, if you have suggestions on how to rewrite the "filter" itself, I'd be happy to test them.

EDIT: Expanded version of the code:

df.loc[df['A'] < a, 'C'] += a * np.minimum(df.loc[df['A'] < a, 'B'], b)

Pandas DF referencing the same slice twice in the same computation

Answers (1)

Related Questions