Reza Heidari
Reza Heidari

Reputation: 1276

How to update pandas DataFrame based on the previous row information

I have the following DataFrame in Pandas and I want to check if HH value is greater than the previous row's High value and if it is greater, then update previous rows HH value and replace the current HH with Nonvalue.

How to check if the value of HH > High of the previous row and update as per above procedure ?

enter image description here

Please note that I don't want to shift all data in a column (so using shift is not the solution I think) and I just want to change one specific piece of data based on the previous row's "High" data

About Program:

I'm trying to create a program that finds Minima and Maxima of the specified financial market, and I'm using 'peakdetect' library https://pypi.org/project/peakdetect/?

It simply generates a 2D list of Minima and Maxima :

density = 2
# Temp ref to the array of minima and maxima
high_arr = peakdetect(y_axis = 
clean_dataframe['High'],x_axis=clean_dataframe.index,lookahead=density)
low_arr = peakdetect(y_axis = 
clean_dataframe['Low'],x_axis=clean_dataframe.index,lookahead=density)

# first index is always for maxima
_hh = pd.DataFrame(high_arr[0])
_hh = _hh.rename(columns={0:'Index',1:'HH'})

# second index is always for minima
_ll = pd.DataFrame(low_arr[1])
_ll = _ll.rename(columns={0:'Index',1:'LL'})

# join all minima and maxima to the

full_df=
clean_dataframe.join(_hh.set_index('Index')).join(_ll.set_index('Index'))

'''

clear_dataframe result:

The problem is some LL (Valley) is not accurate and sometimes the previous row's Low price is the correct LL so I have to measure and change the LL row as mentioned in the picture.

Upvotes: 2

Views: 2921

Answers (1)

Joe Ferndz
Joe Ferndz

Reputation: 8508

To help you understand how the shift(-1) works, please review the below solution. I looked at the image and created the raw DataFrame.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Dates':['2021-02-04 19:00:00','2021-02-04 20:00:00',
                            '2021-02-04 21:00:00','2021-02-04 22:00:00',
                            '2021-02-04 23:00:00','2021-02-05 00:00:00',
                            '2021-02-05 01:00:00','2021-02-05 02:00:00'],
                   'Close':[1.19661,1.19660,1.19611,1.19643,1.19664,
                            1.19692,1.19662,1.19542],
                   'High' :[1.19679,1.19678,1.19680,1.19679,1.19688,
                            1.19721,1.19694,1.19682],
                   'Low'  :[1.19577,1.19637,1.19604,1.19590,1.19632,
                            1.19634,1.19622,1.19537],
                   'Open' :[1.19630,1.19662,1.19665,1.19613,1.19646,
                            1.19662,1.19690,1.19665],
                   'Status':['ok']*8,
                   'Volume':[2579,1858,1399,788,1437,2435,2898,2641],
                   'HH'   :[np.NaN]*5+[1.19721]+[np.NaN]*2,
                   'LL'   :[np.NaN]*8})
print (df)

#make a copy of df['High'] into df'NewHigh']
df['NewHigh'] = df['High']

#if next row in 'HH' is greater than 'High', then update 'NewHigh' with next row from 'HH'
df.loc[df['HH'].shift(-1) > df['High'],'NewHigh'] = df['HH'].shift(-1)

print (df[['Dates','High','HH','NewHigh']])

The output of this will be:

                 Dates     High       HH  NewHigh
0  2021-02-04 19:00:00  1.19679      NaN  1.19679
1  2021-02-04 20:00:00  1.19678      NaN  1.19678
2  2021-02-04 21:00:00  1.19680      NaN  1.19680
3  2021-02-04 22:00:00  1.19679      NaN  1.19679
4  2021-02-04 23:00:00  1.19688      NaN  1.19721 # <- This got updated
5  2021-02-05 00:00:00  1.19721  1.19721  1.19721
6  2021-02-05 01:00:00  1.19694      NaN  1.19694
7  2021-02-05 02:00:00  1.19682      NaN  1.19682

Note: I created a new column to show you the changes. You can directly update High. Instead of 'NewHigh' on the df.loc line, you can give 'High'. That should do the trick.

Upvotes: 3

Related Questions