Ibs
Ibs

Reputation: 21

How can I create a new column on a Pandas Dataframe with conditions based on previous and next row?

I am trying to create a new Pandas column that indicates whether a number is the largest compared to the row above and the row below.

The column is going to be binary, where '1' indicates it is the highest number compared to last and next row, and '0' will indicate the condition is not satisfied.

This is how the data looks:

           Date      High
    0   2015-11-11  25.90
    1   2015-11-12  27.12
    2   2015-11-13  26.20
    3   2015-11-16  26.19
    4   2015-11-17  25.51
    5   2015-11-18  26.31
    6   2015-11-19  26.00
    7   2015-11-20  27.01
    8   2015-11-23  25.60
    9   2015-11-24  27.00
    10  2015-11-25  26.49

This is my desired outcome:

    Date        High    higher
0   2015-11-11  25.90   0.0
1   2015-11-12  27.12   1.0
2   2015-11-13  26.20   0.0
3   2015-11-16  26.19   0.0
4   2015-11-17  25.51   0.0
5   2015-11-18  26.31   1.0
6   2015-11-19  26.00   0.0
7   2015-11-20  27.01   1.0
8   2015-11-23  25.60   0.0
9   2015-11-24  27.00   1.0
10  2015-11-25  26.49   0.0

For further illustration, here is what I did on Microsoft Excel, which is exactly what I am trying to achieve with Python: ExcelExample

First Attempt:

for i in dftest['High']:
    if dftest['High'][i] > dftest['High'][i-1] and dftest['High'][i] > dftest['High'][i+1]:
        dftest['higher'] = 1
    else:
        dftest['higher'] = 0

This results in the error: "TypeError: cannot do label indexing on class 'pandas.core.indexes.range.RangeIndex' with these indexers [25.9] of class 'float'"

Second Attempt: Here I try to make use of '.shift()'.

for i in dftest['High']:
    if dftest['High'] > dftest['High'].shift(1) and dftest['High'] > dftest['High'].shift(-1):
        dftest['higher'] = 1
    else:
        dftest['higher'] = 0

This results in error: "TypeError: unsupported operand type(s) for &: 'float' and 'float'"

Third Attempt (similar to the second attempt): Used '&' operator instead of 'and'.

Results in error: "TypeError: cannot do label indexing on class 'pandas.core.indexes.range.RangeIndex' with these indexers [25.9] of class 'float'"

Would appreciate any sort of help!

Edit: I would really appreciate if the solution provided could easily be alternated if I wanted to find the highest of 5 or 7 or 9 consecutive numbers. Thank you again!

Upvotes: 2

Views: 87

Answers (4)

G.G
G.G

Reputation: 765

df1.assign(higher=df1.High.rolling(3,min_periods=3,center=True).max().eq(df1.High).astype(int))

    Date        High    higher
0   2015-11-11  25.90   0.0
1   2015-11-12  27.12   1.0
2   2015-11-13  26.20   0.0
3   2015-11-16  26.19   0.0
4   2015-11-17  25.51   0.0
5   2015-11-18  26.31   1.0
6   2015-11-19  26.00   0.0
7   2015-11-20  27.01   1.0
8   2015-11-23  25.60   0.0
9   2015-11-24  27.00   1.0
10  2015-11-25  26.49   0.0

Upvotes: 0

Ibs
Ibs

Reputation: 21

In the end, I decided to use this:

hf_three = []

try:
    for i in df.index:   
        if df.loc[i]['High']>df.loc[i+1]['High'] and df.loc[i]['High']>df.loc[i-1]['High']:
            hf_three.append(1)
        else:
            hf_three.append(0)
except ValueError:              # To handle cases where 'i-1' doesn't exist.
    hf_three.append(0)
except KeyError:                # To handle cases where 'i+1' doesn't exist
    hf_three.append(0)

df['higher'] = hf_three

Though this works, if I replace '>' for '<', it no longer works and will not even append. Solutions to this will be greatly appreciated!

Upvotes: 0

Matthew Borish
Matthew Borish

Reputation: 3086

You can use .pct_change() to see if values are increasing or decreasing, and then use np.where() to replace values on condition.

df = pd.read_clipboard()
df['higher'] = df['High'].pct_change()
df['higher'] = np.where(df['higher'] > 0, 1.0, 0.0)

print(df)


    Date        High    higher
0   2015-11-11  25.90   0.0
1   2015-11-12  27.12   1.0
2   2015-11-13  26.20   0.0
3   2015-11-16  26.19   0.0
4   2015-11-17  25.51   0.0
5   2015-11-18  26.31   1.0
6   2015-11-19  26.00   0.0
7   2015-11-20  27.01   1.0
8   2015-11-23  25.60   0.0
9   2015-11-24  27.00   1.0
10  2015-11-25  26.49   0.0

For your comment, you can do rolling max.

df['rolling_max'] = df['High'].rolling(5, min_periods=1).max()

print(df)

Date    High    higher  rolling_max
0   2015-11-11  25.90   0.0 25.90
1   2015-11-12  27.12   0.0 27.12
2   2015-11-13  26.20   0.0 27.12
3   2015-11-16  26.19   0.0 27.12
4   2015-11-17  25.51   0.0 27.12
5   2015-11-18  26.31   1.0 27.12
6   2015-11-19  26.00   0.0 26.31
7   2015-11-20  27.01   1.0 27.01
8   2015-11-23  25.60   0.0 27.01
9   2015-11-24  27.00   1.0 27.01
10  2015-11-25  26.49   1.0 27.01

Upvotes: 1

BENY
BENY

Reputation: 323226

This so called local max

from scipy.signal import argrelextrema
ary=argrelextrema(df.High.values,np.greater)
df['local max']=np.where(df.index.isin(ary[0]),1,0)
df
          Date   High  local max
0   2015-11-11  25.90          0
1   2015-11-12  27.12          1
2   2015-11-13  26.20          0
3   2015-11-16  26.19          0
4   2015-11-17  25.51          0
5   2015-11-18  26.31          1
6   2015-11-19  26.00          0
7   2015-11-20  27.01          1
8   2015-11-23  25.60          0
9   2015-11-24  27.00          1
10  2015-11-25  26.49          0

Upvotes: 1

Related Questions