Reputation: 19
I have a two-column numerical dataframe, and I'm trying to add a 3rd column.
Row col1 col2
0 8 8
1 8 4
2 6 2
3 3 7
4 6 4
5 2 6
Where in the first row,
col3 = max(col1 - col2,0)
and on the rest of the rows,
col3 = max(col1 - col2 + col3_of_the_row_above, 0)
The resulting dataframe should look like this:
Row col1 col2 col3
0 8 8 0
1 8 4 4
2 6 2 8
3 3 7 4
4 6 4 6
5 2 6 2
Is there an efficient way to do this?
Upvotes: 1
Views: 777
Reputation: 122
To create a new column you can just do this:
df['col3'] = 0 # all the rows will be filled with zeros
col3 will be added in you dataframe.
Because the calculation method of your first row is different of the others, you'll need to this manually.
df['col3'][0] = max(df['col1'][0] - df['col2'][0], 0)
The calculation method of the other rows is the same, so you can do this with a for iteration.
for row in range(1, len(df)):
df['col3'][row] = max(df['col1'][row] - df['col2'][row] + df['col3'][row - 1], 0)
P.S: You can do this using list comprehension too, maybe it's too early, but I'll put the code too so you can study the code.
df['col3'] = 0 # all the rows will be filled with zeros
df['col3'] = [max(df['col1'][row] - df['col2'][row] + df['col3'][row - 1], 0) if row > 0 else max(df['col1'][row] - df['col2'][row], 0) for row in range(len(df))]
This is a more pythonic way to this, but it can be a little confusing at first sight.
Upvotes: 2
Reputation: 153460
Try this:
# Calculate value for first row clip lower value to zero
s = (df.iloc[0, df.columns.get_loc('col1')] - df.iloc[0, df.columns.get_loc('col2')]).clip(0,)
# Calculate difference for each row after first
df['col3'] = (df.iloc[1:, df.columns.get_loc('col1')] - df.iloc[1:, df.columns.get_loc('col2')])
# Fill 'col3' with first value then cumsum differences
df['col3'] = df['col3'].fillna(s).cumsum()
df
Output:
col1 col2 col3
Row
0 8 8 0.0
1 8 4 4.0
2 6 2 8.0
3 3 7 4.0
4 6 4 6.0
5 2 6 2.0
Upvotes: 0