Reputation: 21
I am trying to create a new Pandas column that indicates whether a number is the largest compared to the row above and the row below.
The column is going to be binary, where '1' indicates it is the highest number compared to last and next row, and '0' will indicate the condition is not satisfied.
This is how the data looks:
Date High
0 2015-11-11 25.90
1 2015-11-12 27.12
2 2015-11-13 26.20
3 2015-11-16 26.19
4 2015-11-17 25.51
5 2015-11-18 26.31
6 2015-11-19 26.00
7 2015-11-20 27.01
8 2015-11-23 25.60
9 2015-11-24 27.00
10 2015-11-25 26.49
This is my desired outcome:
Date High higher
0 2015-11-11 25.90 0.0
1 2015-11-12 27.12 1.0
2 2015-11-13 26.20 0.0
3 2015-11-16 26.19 0.0
4 2015-11-17 25.51 0.0
5 2015-11-18 26.31 1.0
6 2015-11-19 26.00 0.0
7 2015-11-20 27.01 1.0
8 2015-11-23 25.60 0.0
9 2015-11-24 27.00 1.0
10 2015-11-25 26.49 0.0
For further illustration, here is what I did on Microsoft Excel, which is exactly what I am trying to achieve with Python: ExcelExample
First Attempt:
for i in dftest['High']:
if dftest['High'][i] > dftest['High'][i-1] and dftest['High'][i] > dftest['High'][i+1]:
dftest['higher'] = 1
else:
dftest['higher'] = 0
This results in the error: "TypeError: cannot do label indexing on class 'pandas.core.indexes.range.RangeIndex' with these indexers [25.9] of class 'float'"
Second Attempt: Here I try to make use of '.shift()'.
for i in dftest['High']:
if dftest['High'] > dftest['High'].shift(1) and dftest['High'] > dftest['High'].shift(-1):
dftest['higher'] = 1
else:
dftest['higher'] = 0
This results in error: "TypeError: unsupported operand type(s) for &: 'float' and 'float'"
Third Attempt (similar to the second attempt): Used '&' operator instead of 'and'.
Results in error: "TypeError: cannot do label indexing on class 'pandas.core.indexes.range.RangeIndex' with these indexers [25.9] of class 'float'"
Would appreciate any sort of help!
Edit: I would really appreciate if the solution provided could easily be alternated if I wanted to find the highest of 5 or 7 or 9 consecutive numbers. Thank you again!
Upvotes: 2
Views: 87
Reputation: 765
df1.assign(higher=df1.High.rolling(3,min_periods=3,center=True).max().eq(df1.High).astype(int))
Date High higher
0 2015-11-11 25.90 0.0
1 2015-11-12 27.12 1.0
2 2015-11-13 26.20 0.0
3 2015-11-16 26.19 0.0
4 2015-11-17 25.51 0.0
5 2015-11-18 26.31 1.0
6 2015-11-19 26.00 0.0
7 2015-11-20 27.01 1.0
8 2015-11-23 25.60 0.0
9 2015-11-24 27.00 1.0
10 2015-11-25 26.49 0.0
Upvotes: 0
Reputation: 21
In the end, I decided to use this:
hf_three = []
try:
for i in df.index:
if df.loc[i]['High']>df.loc[i+1]['High'] and df.loc[i]['High']>df.loc[i-1]['High']:
hf_three.append(1)
else:
hf_three.append(0)
except ValueError: # To handle cases where 'i-1' doesn't exist.
hf_three.append(0)
except KeyError: # To handle cases where 'i+1' doesn't exist
hf_three.append(0)
df['higher'] = hf_three
Though this works, if I replace '>' for '<', it no longer works and will not even append. Solutions to this will be greatly appreciated!
Upvotes: 0
Reputation: 3086
You can use .pct_change() to see if values are increasing or decreasing, and then use np.where() to replace values on condition.
df = pd.read_clipboard()
df['higher'] = df['High'].pct_change()
df['higher'] = np.where(df['higher'] > 0, 1.0, 0.0)
print(df)
Date High higher
0 2015-11-11 25.90 0.0
1 2015-11-12 27.12 1.0
2 2015-11-13 26.20 0.0
3 2015-11-16 26.19 0.0
4 2015-11-17 25.51 0.0
5 2015-11-18 26.31 1.0
6 2015-11-19 26.00 0.0
7 2015-11-20 27.01 1.0
8 2015-11-23 25.60 0.0
9 2015-11-24 27.00 1.0
10 2015-11-25 26.49 0.0
For your comment, you can do rolling max.
df['rolling_max'] = df['High'].rolling(5, min_periods=1).max()
print(df)
Date High higher rolling_max
0 2015-11-11 25.90 0.0 25.90
1 2015-11-12 27.12 0.0 27.12
2 2015-11-13 26.20 0.0 27.12
3 2015-11-16 26.19 0.0 27.12
4 2015-11-17 25.51 0.0 27.12
5 2015-11-18 26.31 1.0 27.12
6 2015-11-19 26.00 0.0 26.31
7 2015-11-20 27.01 1.0 27.01
8 2015-11-23 25.60 0.0 27.01
9 2015-11-24 27.00 1.0 27.01
10 2015-11-25 26.49 1.0 27.01
Upvotes: 1
Reputation: 323226
This so called local max
from scipy.signal import argrelextrema
ary=argrelextrema(df.High.values,np.greater)
df['local max']=np.where(df.index.isin(ary[0]),1,0)
df
Date High local max
0 2015-11-11 25.90 0
1 2015-11-12 27.12 1
2 2015-11-13 26.20 0
3 2015-11-16 26.19 0
4 2015-11-17 25.51 0
5 2015-11-18 26.31 1
6 2015-11-19 26.00 0
7 2015-11-20 27.01 1
8 2015-11-23 25.60 0
9 2015-11-24 27.00 1
10 2015-11-25 26.49 0
Upvotes: 1