Leo
Leo

Reputation: 13

Loop through rows of a dataframe, create a new column and store the result based on condition in another column

I have a df as follows:

Name    Reference   Efficiency
TargetA    Yes      13
Target_1    No      12
Target_2    No      13
Target_3    No      10
Target_4    No      8
TargetB     Yes     14
Target_4    No      12
Target_5    No      11
Target_6    No     10
TargetC     Yes    15
Target_6    No      11
Target_7    No      13
Target_8    No      12
Target_9    No      14
Target_10   No     10

I want to loop through all the rows, wherever there is 'Yes' in Reference column, it will create another column called 'Check' and subtract Efficiency values(13,12,13,10,8) from 13(which is the corresponding value of 'Yes'. Next it will subtract Efficiency values(14,12,11,10) from 14(which is the corresponding value of 'Yes' for the next 'Yes') and so on.

Expected output:

Name    Reference   Efficiency  Check

TargetA    Yes            13    0
Target_1    No            12    1
Target_2    No            13    0
Target_3    No            10    3
Target_4    No             8    5
TargetB     Yes           14    0
Target_4    No            12    2
Target_5    No            11    3
Target_6    No            10    4
TargetC     Yes           15    0
Target_6    No            11    4
Target_7    No            13    2
Target_8    No            12    3
Target_9    No            14    1
Target_10   No            10    5

I have tried the following codes:

for i, row in df.iterrows():
    i = 0
    val = row['Reference']
    if val == 'Yes':
        df['check'] = df.loc[i,'Efficiency'] - df['Efficiency'].shift(0)

I got the following result:

Name    Reference   Efficiency  Check
0   TargetA     Yes           13    0
1   Target_1    No            12    1
2   Target_2    No            13    0
3   Target_3    No            10    3
4   Target_4    No             8    5
5   TargetB     Yes           14    -1
6   Target_4    No            12    1
7   Target_5    No            11    2
8   Target_6    No            10    3
9   TargetC     Yes           15    -2
10  Target_6    No            11    2
11  Target_7    No            13    0
12  Target_8    No            12    1
13  Target_9    No            14    -1
14  Target_10   No            10    3

I got the result correctly in the first 'Yes' Please can someone help me

Upvotes: 1

Views: 657

Answers (2)

Leo
Leo

Reputation: 13

I also used the code below to create an auxiliary / helper column, only containing the Efficiencies where "Yes" was found:

for i, row in df.iterrows():   
    val = row['Reference']
    if val == 'Yes':
        df['check'] = df[df['Reference']=='Yes']['Efficiency']

Upvotes: 0

Andreas
Andreas

Reputation: 9207

Create an auxillary / helper column, only containing the Efficiencies where "Yes" was found. Then replace missing values with the previous valid entries, go through the example step by step:

Sample data:

import pandas as pd
data = {'Name': {0: 'TargetA',
  1: 'Target_1',
  2: 'Target_2',
  3: 'Target_3',
  4: 'Target_4',
  5: 'TargetB',
  6: 'Target_4',
  7: 'Target_5',
  8: 'Target_6',
  9: 'TargetC',
  10: 'Target_6',
  11: 'Target_7',
  12: 'Target_8',
  13: 'Target_9',
  14: 'Target_10'},
 'Reference': {0: 'Yes',
  1: 'No',
  2: 'No',
  3: 'No',
  4: 'No',
  5: 'Yes',
  6: 'No',
  7: 'No',
  8: 'No',
  9: 'Yes',
  10: 'No',
  11: 'No',
  12: 'No',
  13: 'No',
  14: 'No'},
 'Efficiency': {0: 13,
  1: 12,
  2: 13,
  3: 10,
  4: 8,
  5: 14,
  6: 12,
  7: 11,
  8: 10,
  9: 15,
  10: 11,
  11: 13,
  12: 12,
  13: 14,
  14: 10}}
df = pd.DataFrame(data)

Code:

mask = df['Reference'].eq('Yes')
df['Check'] = pd.NA
df.loc[mask, 'Check'] = df['Efficiency'].loc[mask].copy()
df['Check'] = df['Check'].ffill()
df['Check'] = df['Check'] - df['Efficiency']

Upvotes: 1

Related Questions