user1718097
user1718097

Reputation: 4292

Applying function to each row of pandas data frame - with speed

I have a dataframe that has the following basic structure:

import numpy as np
import pandas as pd
tempDF = pd.DataFrame({'condition':[0,0,0,0,0,1,1,1,1,1],'x1':[1.2,-2.3,-2.1,2.4,-4.3,2.1,-3.4,-4.1,3.2,-3.3],'y1':[6.5,-7.6,-3.4,-5.3,7.6,5.2,-4.1,-3.3,-5.7,5.3],'decision':[np.nan]*10})
print tempDF
   condition  decision   x1   y1
0          0       NaN  1.2  6.5
1          0       NaN -2.3 -7.6
2          0       NaN -2.1 -3.4
3          0       NaN  2.4 -5.3
4          0       NaN -4.3  7.6
5          1       NaN  2.1  5.2
6          1       NaN -3.4 -4.1
7          1       NaN -4.1 -3.3
8          1       NaN  3.2 -5.7
9          1       NaN -3.3  5.3

Within each row, I want to change the value of the 'decision' column to zero if the 'condition' column equals zero and if 'x1' and 'y1' are both the same sign (either positive or negative) - for the purposes of this script zero is considered to be positive. If the signs of 'x1' and 'y1' are different or if the 'condition' column equals 1 (regardless of the signs of 'x1' and 'y1') then the 'decision' column should equal 1. I hope I've explained that clearly.

I can iterate over each row of the dataframe as follows:

for i in range(len(tempDF)):
    if (tempDF.ix[i,'condition'] == 0 and ((tempDF.ix[i,'x1'] >= 0) and (tempDF.ix[i,'y1'] >=0)) or ((tempDF.ix[i,'x1'] < 0) and (tempDF.ix[i,'y1'] < 0))):
        tempDF.ix[i,'decision'] = 0
    else:
        tempDF.ix[i,'decision'] = 1

print tempDF
           condition  decision   x1   y1
        0          0         0  1.2  6.5
        1          0         0 -2.3 -7.6
        2          0         0 -2.1 -3.4
        3          0         1  2.4 -5.3
        4          0         1 -4.3  7.6
        5          1         1  2.1  5.2
        6          1         1 -3.4 -4.1
        7          1         1 -4.1 -3.3
        8          1         1  3.2 -5.7
        9          1         1 -3.3  5.3

This produces the right output but it's a bit slow. The real dataframe I have is very large and these comparisons will need to be made many times. Is there a more efficient way to achieve the desired result?

Upvotes: 0

Views: 581

Answers (1)

jme
jme

Reputation: 20695

First, use np.sign and the comparison operators to create a boolean array which is True where the decision should be 1:

decision = df["condition"] | (np.sign(df["x1"]) != np.sign(df["y1"]))

Here I've used DeMorgan's laws.

Then cast to int and put it in the dataframe:

df["decision"] = decision.astype(int)

Giving:

>>> df
   condition  decision   x1   y1
0          0         0  1.2  6.5
1          0         0 -2.3 -7.6
2          0         0 -2.1 -3.4
3          0         1  2.4 -5.3
4          0         1 -4.3  7.6
5          1         1  2.1  5.2
6          1         1 -3.4 -4.1
7          1         1 -4.1 -3.3
8          1         1  3.2 -5.7
9          1         1 -3.3  5.3

Upvotes: 1

Related Questions