Reputation: 153

Creating a pandas column that performs a calculation on another column, but changes the variable in the calculation based on a third column

I currently a dataframe that contains a column called load, and I want to create a column called calculated load, that uses a simple formula on the column load, and a variable. However, I want the calculation to change the variable it uses when it sees the value 1 in a column called postition, and uses that formula until it sees -1 in position, when the values start to rise again. Here is my current code:

import pandas as pd

s_falling = -4
s_rising = 2
x = 2

df = pd.DataFrame({"load": [1,2,4,6,2,4,7,4,8,3,4,7,3,3,6,4,7,4,3,2],
                   "position": [0,0.2,0.5,0.8,0.7,1,0.7,0.6,0.7,0.8,0.4,0.2,0,-0.5,-0.8,-1,-0.8,-0.9,-0.7,-0.6]})

df['calculated load'] = df['load'] + x * s_rising

print(df['calculated load'])

0      5
1      6
2      8
3     10
4      6
5      8
6     11
7      8
8     12
9      7
10     8
11    11
12     7
13     7
14    10
15     8
16    11
17     8
18     7
19     6

This works up to the position after 1, when the values start falling, I want to use this formula that swaps s_rising for s_falling, and continues to use this new variable with the formula iterating over the column, and then reverts back to the original formula using the variable s_rising from the position after -1 is seen again:

df['calculated load'] = df['load'] + x *s_falling

The formula doesn't change, merely the variable being used within it.

I can't just check if the value after is less than or more than the previous value, as the values in position don't rise and fall perfectly. Ideally, this would be my desired output:

print(df['calculated load'])

0      5
1      6
2      8
3     10
4      6
5      8
6      3
7      0
8      4
9     -1
10     0
11     3
12    -1
13    -1
14     2
15     0
16    11
17     8
18     7
19     8

EDIT: Some very kind people have offered solutions, and I have realised that my question (designed to produce a small, reproducible example) was slightly off the mark. I have edited the question to reflect this.

Upvotes: 0

Answers (3)

ragas

Reputation: 916

Check this & let me know if it's work.

x = 2

df = pd.DataFrame({"load": [1,2,4,6,2,4,7,4,8,3,4,7,3,3,6,4,7,4,3,2],
                    "position": [0,0.2,0.5,0.8,0.7,1,0.7,0.6,0.7,0.8,0.4,0.2,0,-0.5,-0.8,-1,-0.8,-0.9,-0.7,-0.6]})




for i, row in df.iterrows():
    if df[df['position']==1.0].index[0]<=i<df[df['position']==-1.0].index[0]:
        
        df.loc[i, 'calculated load'] = df.loc[i, 'load'] - x
        
    else:
        
        df.loc[i, 'calculated load'] = df.loc[i, 'load'] + x
    
        
print(df)

load  position  calculated load
0      1       0.0              3.0
1      2       0.2              4.0
2      4       0.5              6.0
3      6       0.8              8.0
4      2       0.7              4.0
5      4       1.0              2.0
6      7       0.7              5.0
7      4       0.6              2.0
8      8       0.7              6.0
9      3       0.8              1.0
10     4       0.4              2.0
11     7       0.2              5.0
12     3       0.0              1.0
13     3      -0.5              1.0
14     6      -0.8              4.0
15     4      -1.0              6.0
16     7      -0.8              9.0
17     4      -0.9              6.0
18     3      -0.7              5.0
19     2      -0.6              4.0

Upvotes: 1

Corralien

Reputation: 120399

Without loop, you can use:

x1 = df['position'].eq(1).mul(-x).shift(fill_value=0)
x2 = df['position'].eq(-1).mul(x)
xm = (p1|p2).replace(0, np.nan).ffill().fillna(x).astype(int)

df['calculated load'] = df['load'] + xm

Output:

>>> df
    load  position  calculated load
0      1       0.0                3
1      2       0.2                4
2      4       0.5                6
3      6       0.8                8
4      2       0.7                4
5      4       1.0                6
6      7       0.7                5
7      4       0.6                2
8      8       0.7                6
9      3       0.8                1
10     4       0.4                2
11     7       0.2                5
12     3       0.0                1
13     3      -0.5                1
14     6      -0.8                4
15     4      -1.0                6
16     7      -0.8                9
17     4      -0.9                6
18     3      -0.7                5
19     2      -0.6                4

Upvotes: 1

ArrowRise

Reputation: 648

I believe this code is working, but it's not efficient because of itterrows(). If someone find a way to vectorize it you can comment my answer.

import pandas as pd

x = 2

df = pd.DataFrame({"load": [1,2,4,6,2,4,7,4,8,3,4,7,3,3,6,4,7,4,3,2],
                   "position": [0,0.2,0.5,0.8,0.7,1,0.7,0.6,0.7,0.8,0.4,0.2,0,-0.5,-0.8,-1,-0.8,-0.9,-0.7,-0.6]})

increasing = True
list_increasing = []
for index, row in df.iterrows():
    if increasing and row.position == 1:
        increasing = False
    elif not increasing and row.position == -1:
        increasing = True
    list_increasing.append(increasing)
    
df['increasing'] = list_increasing

def calculated_load(row):
    if row.increasing:
        return row.load + x
    else:
        return row.load - x

df['cal load'] = df.apply(calculated_load, axis=1)

Upvotes: 1

Creating a pandas column that performs a calculation on another column, but changes the variable in the calculation based on a third column

Answers (3)

Related Questions