Dirich
Dirich

Reputation: 442

Pandas DF referencing the same slice twice in the same computation

I have a huge data set to process and I am trying to optimize the most costly line, processing wise.

I use a df with 3 columns, A, B and C. I have 2 values, a and b, which are used to update the value of C in a subset of the df.

Before I continue, let me define a textual substitution to increase readability:

filter(_X) -> df.loc[df['A'] < a, _X]

Every time I type "filter", please substitute it with the text on the right (applying the correct argument in place of the parameter _X - think C/C++ macros). The line of code in question is:

filter('C') += a * np.minimum(filter('B'), b)

What I'm not sure about is if python will process "filter" twice when evaluating the expression, or if it will use a "reference" (a-la C++) and only do it once. In the former case, is there a way for me to rewrite the expression in a way to avoid the double execution of the code of "filter"?

Moreover, if you have suggestions on how to rewrite the "filter" itself, I'd be happy to test them.

EDIT: Expanded version of the code:

df.loc[df['A'] < a, 'C'] += a * np.minimum(df.loc[df['A'] < a, 'B'], b)

Upvotes: 1

Views: 381

Answers (1)

MattR
MattR

Reputation: 5126

If I understand correctly, you may not need to "filter twice" after the +=. see my example below:

np.random.seed(5)
df =  pd.DataFrame(np.random.randint(0,100,size=(4, 4)), columns=list('ABCD'))


    A   B   C   D
0   99  78  61  16
1   73  8   62  27
2   30  80  7   76
3   15  53  80  27

Now if you wanted to add the values of the minimum of columns C and D to the current value of B that would simply be: df.loc[df['A'] < 80, 'B'] += np.minimum(df['C'], df['D'])

    A    B      C   D
0   99  78.0    61  16  
1   73  35.0    62  27 #<--- meets condition 8+27=35
2   30  87.0    7   76 #<--- meets condition 80+7=87
3   15  80.0    80  27 #<--- meets condition 53+27=80

Notice how when A < 80. the B value changes with whichever value in C or D is smaller. One thing to note is that B turns to a float. Not sure why.

Upvotes: 1

Related Questions