Reputation: 1276
I have the following DataFrame in Pandas and I want to check if HH value is greater than the previous row's High value and if it is greater, then update previous rows HH value and replace the current HH with Nonvalue.
How to check if the value of HH > High of the previous row and update as per above procedure ?
Please note that I don't want to shift all data in a column (so using shift is not the solution I think) and I just want to change one specific piece of data based on the previous row's "High" data
About Program:
I'm trying to create a program that finds Minima and Maxima of the specified financial market, and I'm using 'peakdetect' library https://pypi.org/project/peakdetect/?
It simply generates a 2D list of Minima and Maxima :
density = 2
# Temp ref to the array of minima and maxima
high_arr = peakdetect(y_axis =
clean_dataframe['High'],x_axis=clean_dataframe.index,lookahead=density)
low_arr = peakdetect(y_axis =
clean_dataframe['Low'],x_axis=clean_dataframe.index,lookahead=density)
# first index is always for maxima
_hh = pd.DataFrame(high_arr[0])
_hh = _hh.rename(columns={0:'Index',1:'HH'})
# second index is always for minima
_ll = pd.DataFrame(low_arr[1])
_ll = _ll.rename(columns={0:'Index',1:'LL'})
# join all minima and maxima to the
full_df=
clean_dataframe.join(_hh.set_index('Index')).join(_ll.set_index('Index'))
'''
clear_dataframe result:
The problem is some LL (Valley) is not accurate and sometimes the previous row's Low price is the correct LL so I have to measure and change the LL row as mentioned in the picture.
Upvotes: 2
Views: 2921
Reputation: 8508
To help you understand how the shift(-1) works, please review the below solution. I looked at the image and created the raw DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Dates':['2021-02-04 19:00:00','2021-02-04 20:00:00',
'2021-02-04 21:00:00','2021-02-04 22:00:00',
'2021-02-04 23:00:00','2021-02-05 00:00:00',
'2021-02-05 01:00:00','2021-02-05 02:00:00'],
'Close':[1.19661,1.19660,1.19611,1.19643,1.19664,
1.19692,1.19662,1.19542],
'High' :[1.19679,1.19678,1.19680,1.19679,1.19688,
1.19721,1.19694,1.19682],
'Low' :[1.19577,1.19637,1.19604,1.19590,1.19632,
1.19634,1.19622,1.19537],
'Open' :[1.19630,1.19662,1.19665,1.19613,1.19646,
1.19662,1.19690,1.19665],
'Status':['ok']*8,
'Volume':[2579,1858,1399,788,1437,2435,2898,2641],
'HH' :[np.NaN]*5+[1.19721]+[np.NaN]*2,
'LL' :[np.NaN]*8})
print (df)
#make a copy of df['High'] into df'NewHigh']
df['NewHigh'] = df['High']
#if next row in 'HH' is greater than 'High', then update 'NewHigh' with next row from 'HH'
df.loc[df['HH'].shift(-1) > df['High'],'NewHigh'] = df['HH'].shift(-1)
print (df[['Dates','High','HH','NewHigh']])
The output of this will be:
Dates High HH NewHigh
0 2021-02-04 19:00:00 1.19679 NaN 1.19679
1 2021-02-04 20:00:00 1.19678 NaN 1.19678
2 2021-02-04 21:00:00 1.19680 NaN 1.19680
3 2021-02-04 22:00:00 1.19679 NaN 1.19679
4 2021-02-04 23:00:00 1.19688 NaN 1.19721 # <- This got updated
5 2021-02-05 00:00:00 1.19721 1.19721 1.19721
6 2021-02-05 01:00:00 1.19694 NaN 1.19694
7 2021-02-05 02:00:00 1.19682 NaN 1.19682
Note: I created a new column to show you the changes. You can directly update High
. Instead of 'NewHigh'
on the df.loc line, you can give 'High'. That should do the trick.
Upvotes: 3