Jane Doe
Jane Doe

Reputation: 59

Python dataframe new column with value based on value in other row

I have another simple question as a follow up from here. Python pandas dataframe sum numbers until they change

I want to analyze a serie of numbers which go up and down. So far I have this code.

df = pd.DataFrame([8, 2, 4, 5, 6, 4, 3, 5, 5, 3, 2], columns=['A'])
df['UD'] = np.where(df['A'].pct_change().gt(0), 1, np.where(df['A'].pct_change().lt(0), -1, 0))
df['ST'] = df['UD'].groupby((df['UD'].ne(df['UD'].shift(1))).cumsum()).cumsum()
df['TB'] = np.where(df['A'].pct_change().ge(0) & df['A'].pct_change().shift(-1).le(0), 'T', np.where(df['A'].pct_change().le(0) & df['A'].pct_change().shift(-1).ge(0), 'B', ''))
print(df)

Now I want a new column with how much the number changed since the last Top or Bottom. So the output will look like this.

    A  UD  ST TB     C
0   8   0   0     
1   2  -1  -1  B  
2   4   1   1     2.00
3   5   1   2     2.50
4   6   1   3  T  3.00
5   4  -1  -1     0.67
6   3  -1  -2  B  0.50
7   5   1   1  T  1.67
8   5   0   0  T  1.00
9   3  -1  -1     0.60
10  2  -1  -2     0.40

My first thought was to use shift() with the value of column ST.

df['CS'] = df['A'] / df['A'].shift(df['ST'].abs())

This throws an error.

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

My second thought was to use iloc with the value of column ST.

df['CS'] = df['A'] / df['A'].iloc[df.index - df['ST'].abs()]

But that throws another error.

ValueError: cannot reindex from a duplicate axis

What is the correct way to achieve this?

Upvotes: 0

Views: 118

Answers (1)

Behzad Shayegh
Behzad Shayegh

Reputation: 333

Here you are:

import numpy as np
df['LastTB'] = (df.A * df.TB.replace({'':np.nan, 'T':True, 'B':True})).fillna(method="ffill")
df['C'] = df.A / df.LastTB.shift()

and the output is:

    A  UD  ST TB  LastTB         C
0   8   0   0        NaN       NaN
1   2  -1  -1  B     2.0       NaN
2   4   1   1        2.0  2.000000
3   5   1   2        2.0  2.500000
4   6   1   3  T     6.0  3.000000
5   4  -1  -1        6.0  0.666667
6   3  -1  -2  B     3.0  0.500000
7   5   1   1  T     5.0  1.666667
8   5   0   0  T     5.0  1.000000
9   3  -1  -1        5.0  0.600000
10  2  -1  -2        5.0  0.400000

If you are determined to write in one line, you can combine both lines:

df['C'] = df.A / (df.A * df.TB.replace({'':np.nan, 'T':True, 'B':True})).fillna(method="ffill").shift()

Upvotes: 1

Related Questions