pm123
pm123

Reputation: 3

Pandas minimum time where high is bigger than current high

I want to know within a pandas dataframe the first minimum time which the "high" is bigger than current high.

So for example, I have a dataframe with the following columns:

df["date"]
df["open"]
df["high"]
df["low"]
df["close"]

I want to know the MINIMUM date where the HIGH is bigger than Current high, this is what I have so far:

import pandas as pd
import numpy as np
df = pd.read_csv("spy10mindata.csv")
df.columns = [x.lower() for x in df.columns]
df["date"] = pd.to_datetime(df['datetime'], dayfirst=True)
df = df.sort_values(["date"], ascending=[True])
df['just_date'] = df['date'].dt.date
df['just_date2'] = df['date'].dt.date
df['just_time'] = df['date'].dt.time
df["numdate"] = df['date'].dt.strftime("%Y%m%d").astype(int)
df["numtime"] = df['date'].dt.strftime("%H%M%S").astype(int)

df["try"] = np.where(df["high"] > df["high")) 

""

But can't get over this problem for this analysis. If you can help I appreciate a lot

Upvotes: 0

Views: 185

Answers (1)

furas
furas

Reputation: 142641

Maybe it can be done with rolling window but I have no idea.

I iterate rows to get current high and create sub_df with all rows after this value.

BTW: I use random to create some example data. Because I use seed() so you should get the same value in every execution - so values are not so random.

import pandas as pd
import random

random.seed(0)

df = pd.DataFrame({
    'date': pd.date_range(start='2020.05.01 12:00', periods=10, freq='d'),
    'high': [random.randint(0, 10) for _ in range(10)],
})

print(df)

df['higher_value'] = None
df['higher_data'] = None
df['higher_index'] = None

for index, row in df.iterrows():
    print('current: row:', index, 'high:', row['high'])

    sub_df = df[index+1:] 
    higher_items = sub_df[ sub_df['high'] > row['high'] ]

    if len(higher_items):
        first = higher_items.iloc[0]
        print(' higher: row:', first.name, 'high:', first['high'])
        df['higher_value'][index] = first['high']
        df['higher_index'][index] = first.name
        df['higher_data'][index] = first['date']

    else:
        print(' higher: None')

    print('---')

print(df)    

Before:

                 date  high
0 2020-05-01 12:00:00     6
1 2020-05-02 12:00:00     6
2 2020-05-03 12:00:00     0
3 2020-05-04 12:00:00     4
4 2020-05-05 12:00:00     8
5 2020-05-06 12:00:00     7
6 2020-05-07 12:00:00     6
7 2020-05-08 12:00:00     4
8 2020-05-09 12:00:00     7
9 2020-05-10 12:00:00     5

After:

                 date  high higher_value          higher_data higher_index
0 2020-05-01 12:00:00     6            8  2020-05-05 12:00:00            4
1 2020-05-02 12:00:00     6            8  2020-05-05 12:00:00            4
2 2020-05-03 12:00:00     0            4  2020-05-04 12:00:00            3
3 2020-05-04 12:00:00     4            8  2020-05-05 12:00:00            4
4 2020-05-05 12:00:00     8         None                 None         None
5 2020-05-06 12:00:00     7         None                 None         None
6 2020-05-07 12:00:00     6            7  2020-05-09 12:00:00            8
7 2020-05-08 12:00:00     4            7  2020-05-09 12:00:00            8
8 2020-05-09 12:00:00     7         None                 None         None
9 2020-05-10 12:00:00     5         None                 None         None

Upvotes: 1

Related Questions