natnay
natnay

Reputation: 490

Loop through grouped rows and compare 1st row in group- Python

I have data that is grouped by origin and destination pairs. I would like to compare the first row in the group against the other rows in the group. If a row meets certain conditions, then I would like to flag it. Below is a snapshot of my data:

Orig    Dest    Route   Vol     Per   VolPct
ORD     ICN     A       2,251   0.64    0.78
ORD     ICN     B       366     0.97    0.13
ORD     ICN     C       142     0.14    0.05
ORD     ICN     D       100     0.22    0.03
ORD     ICN     E       25      1.00    0.01

If a row has a "VolPct" of >=.1, then it gets compared against the 1st row. So in this example, Route B would be compared against Route A. If the difference between "Per" of Route B and Route A is >= .2, then I'd like to flag Route B by putting an "X" in a new column. In this example, Route B would get flagged. Expected output would look like this:

Orig    Dest    Route   Vol      Per    VolPct  Flag
ORD     ICN     A       2,251    0.64   0.78    
ORD     ICN     B       366      0.97   0.13       x
ORD     ICN     C       142      0.14   0.05    
ORD     ICN     D       100      0.22   0.03    
ORD     ICN     E       25       1.00   0.01    

Any suggestions how to go about this? I'm using Pandas and Python 3.6. Thanks for your help!

Upvotes: 4

Views: 573

Answers (1)

cs95
cs95

Reputation: 402503

IIUC, you can do this with groupby + first. Use np.where for determining values for Flag.

i = df.VolPct >= .1
j = (df.Per - df.groupby('Orig').Per.transform('first') >= .2)   

df['Flag'] = np.where(i & j, 'X', '')
df

  Orig Dest Route    Vol   Per  VolPct Flag
0  ORD  ICN     A  2,251  0.64    0.78     
1  ORD  ICN     B    366  0.97    0.13    X
2  ORD  ICN     C    142  0.14    0.05     
3  ORD  ICN     D    100  0.22    0.03     
4  ORD  ICN     E     25  1.00    0.01  

Upvotes: 5

Related Questions