Reputation: 3
I want to know within a pandas dataframe the first minimum time which the "high" is bigger than current high.
So for example, I have a dataframe with the following columns:
df["date"]
df["open"]
df["high"]
df["low"]
df["close"]
I want to know the MINIMUM date where the HIGH is bigger than Current high, this is what I have so far:
import pandas as pd
import numpy as np
df = pd.read_csv("spy10mindata.csv")
df.columns = [x.lower() for x in df.columns]
df["date"] = pd.to_datetime(df['datetime'], dayfirst=True)
df = df.sort_values(["date"], ascending=[True])
df['just_date'] = df['date'].dt.date
df['just_date2'] = df['date'].dt.date
df['just_time'] = df['date'].dt.time
df["numdate"] = df['date'].dt.strftime("%Y%m%d").astype(int)
df["numtime"] = df['date'].dt.strftime("%H%M%S").astype(int)
df["try"] = np.where(df["high"] > df["high"))
""
But can't get over this problem for this analysis. If you can help I appreciate a lot
Upvotes: 0
Views: 185
Reputation: 142641
Maybe it can be done with rolling window but I have no idea.
I iterate rows to get current high
and create sub_df
with all rows after this value.
BTW: I use random
to create some example data. Because I use seed()
so you should get the same value in every execution - so values are not so random.
import pandas as pd
import random
random.seed(0)
df = pd.DataFrame({
'date': pd.date_range(start='2020.05.01 12:00', periods=10, freq='d'),
'high': [random.randint(0, 10) for _ in range(10)],
})
print(df)
df['higher_value'] = None
df['higher_data'] = None
df['higher_index'] = None
for index, row in df.iterrows():
print('current: row:', index, 'high:', row['high'])
sub_df = df[index+1:]
higher_items = sub_df[ sub_df['high'] > row['high'] ]
if len(higher_items):
first = higher_items.iloc[0]
print(' higher: row:', first.name, 'high:', first['high'])
df['higher_value'][index] = first['high']
df['higher_index'][index] = first.name
df['higher_data'][index] = first['date']
else:
print(' higher: None')
print('---')
print(df)
Before:
date high
0 2020-05-01 12:00:00 6
1 2020-05-02 12:00:00 6
2 2020-05-03 12:00:00 0
3 2020-05-04 12:00:00 4
4 2020-05-05 12:00:00 8
5 2020-05-06 12:00:00 7
6 2020-05-07 12:00:00 6
7 2020-05-08 12:00:00 4
8 2020-05-09 12:00:00 7
9 2020-05-10 12:00:00 5
After:
date high higher_value higher_data higher_index
0 2020-05-01 12:00:00 6 8 2020-05-05 12:00:00 4
1 2020-05-02 12:00:00 6 8 2020-05-05 12:00:00 4
2 2020-05-03 12:00:00 0 4 2020-05-04 12:00:00 3
3 2020-05-04 12:00:00 4 8 2020-05-05 12:00:00 4
4 2020-05-05 12:00:00 8 None None None
5 2020-05-06 12:00:00 7 None None None
6 2020-05-07 12:00:00 6 7 2020-05-09 12:00:00 8
7 2020-05-08 12:00:00 4 7 2020-05-09 12:00:00 8
8 2020-05-09 12:00:00 7 None None None
9 2020-05-10 12:00:00 5 None None None
Upvotes: 1