Reputation: 13
I have a df as follows:
Name Reference Efficiency
TargetA Yes 13
Target_1 No 12
Target_2 No 13
Target_3 No 10
Target_4 No 8
TargetB Yes 14
Target_4 No 12
Target_5 No 11
Target_6 No 10
TargetC Yes 15
Target_6 No 11
Target_7 No 13
Target_8 No 12
Target_9 No 14
Target_10 No 10
I want to loop through all the rows, wherever there is 'Yes' in Reference column, it will create another column called 'Check' and subtract Efficiency values(13,12,13,10,8) from 13(which is the corresponding value of 'Yes'. Next it will subtract Efficiency values(14,12,11,10) from 14(which is the corresponding value of 'Yes' for the next 'Yes') and so on.
Expected output:
Name Reference Efficiency Check
TargetA Yes 13 0
Target_1 No 12 1
Target_2 No 13 0
Target_3 No 10 3
Target_4 No 8 5
TargetB Yes 14 0
Target_4 No 12 2
Target_5 No 11 3
Target_6 No 10 4
TargetC Yes 15 0
Target_6 No 11 4
Target_7 No 13 2
Target_8 No 12 3
Target_9 No 14 1
Target_10 No 10 5
I have tried the following codes:
for i, row in df.iterrows():
i = 0
val = row['Reference']
if val == 'Yes':
df['check'] = df.loc[i,'Efficiency'] - df['Efficiency'].shift(0)
I got the following result:
Name Reference Efficiency Check
0 TargetA Yes 13 0
1 Target_1 No 12 1
2 Target_2 No 13 0
3 Target_3 No 10 3
4 Target_4 No 8 5
5 TargetB Yes 14 -1
6 Target_4 No 12 1
7 Target_5 No 11 2
8 Target_6 No 10 3
9 TargetC Yes 15 -2
10 Target_6 No 11 2
11 Target_7 No 13 0
12 Target_8 No 12 1
13 Target_9 No 14 -1
14 Target_10 No 10 3
I got the result correctly in the first 'Yes' Please can someone help me
Upvotes: 1
Views: 657
Reputation: 13
I also used the code below to create an auxiliary / helper column, only containing the Efficiencies where "Yes" was found:
for i, row in df.iterrows():
val = row['Reference']
if val == 'Yes':
df['check'] = df[df['Reference']=='Yes']['Efficiency']
Upvotes: 0
Reputation: 9207
Create an auxillary / helper column, only containing the Efficiencies where "Yes" was found. Then replace missing values with the previous valid entries, go through the example step by step:
Sample data:
import pandas as pd
data = {'Name': {0: 'TargetA',
1: 'Target_1',
2: 'Target_2',
3: 'Target_3',
4: 'Target_4',
5: 'TargetB',
6: 'Target_4',
7: 'Target_5',
8: 'Target_6',
9: 'TargetC',
10: 'Target_6',
11: 'Target_7',
12: 'Target_8',
13: 'Target_9',
14: 'Target_10'},
'Reference': {0: 'Yes',
1: 'No',
2: 'No',
3: 'No',
4: 'No',
5: 'Yes',
6: 'No',
7: 'No',
8: 'No',
9: 'Yes',
10: 'No',
11: 'No',
12: 'No',
13: 'No',
14: 'No'},
'Efficiency': {0: 13,
1: 12,
2: 13,
3: 10,
4: 8,
5: 14,
6: 12,
7: 11,
8: 10,
9: 15,
10: 11,
11: 13,
12: 12,
13: 14,
14: 10}}
df = pd.DataFrame(data)
Code:
mask = df['Reference'].eq('Yes')
df['Check'] = pd.NA
df.loc[mask, 'Check'] = df['Efficiency'].loc[mask].copy()
df['Check'] = df['Check'].ffill()
df['Check'] = df['Check'] - df['Efficiency']
Upvotes: 1