Reputation: 510
I have a sample dataframe(df) like below:
Date_Time Open High Low Close UOD VWB
20 2020-07-01 10:30:00 10298.85 10299.90 10287.85 10299.90 UP 3
21 2020-07-01 10:35:00 10301.40 10310.00 10299.15 10305.75 UP 3
22 2020-07-01 10:40:00 10305.75 10305.75 10285.50 10290.00 DOWN 3
24 2020-07-01 10:45:00 10290.00 10291.20 10277.65 10282.65 DOWN 0
25 2020-07-01 10:50:00 10282.30 10289.80 10278.00 10282.00 DOWN 3
26 2020-07-01 10:55:00 10280.10 10295.00 10279.80 10291.50 UP 3
27 2020-07-01 11:00:00 10290.00 10299.95 10287.30 10297.55 UP 3
28 2020-07-01 11:05:00 10296.70 10306.30 10294.50 10299.40 UP 3
29 2020-07-01 11:10:00 10299.95 10301.10 10291.50 10292.00 DOWN 0
30 2020-07-01 11:15:00 10293.05 10298.70 10286.00 10291.55 DOWN 3
31 2020-07-01 11:20:00 10292.00 10298.70 10286.00 10351.45 DOWN 1
I have below conditions:
- Check for df['VWB'] == 0 & df['UOD'] == "DOWN" & get the corresponding Open value (= 10290.00 in my example)
- Then Find the first occurrence of Close value greater than this Open value (10290.00) after that row.
- Find the time Difference between two rors with Condition 1 (df['VWB'] == 0 & df['UOD'] == "DOWN") and Condition 2 (first occurrence) in to another column (TD).
I want my desired outout as below with Valid Column
Date_Time Open High Low Close UOD VWB Valid TD
20 2020-07-01 10:30:00 10298.85 10299.90 10287.85 10299.90 UP 3 0
21 2020-07-01 10:35:00 10301.40 10310.00 10299.15 10305.75 UP 3 0
22 2020-07-01 10:40:00 10305.75 10305.75 10285.50 10290.00 DOWN 3 0
23 2020-07-01 10:45:00 10290.00 10291.20 10277.65 10282.65 DOWN 0 0
25 2020-07-01 10:50:00 10282.30 10289.80 10278.00 10282.00 DOWN 3 0
26 2020-07-01 10:55:00 10280.10 10295.00 10279.80 10291.50 UP 3 1 600 <<= first occurrence
27 2020-07-01 11:00:00 10290.00 10299.95 10287.30 10297.55 UP 3 0
28 2020-07-01 11:05:00 10296.70 10306.30 10294.50 10299.40 UP 3 0
29 2020-07-01 11:10:00 10299.95 10301.10 10291.50 10292.00 DOWN 0 0
30 2020-07-01 11:15:00 10293.05 10298.70 10286.00 10291.55 DOWN 3 0
31 2020-07-01 11:20:00 10292.00 10298.70 10286.00 10351.45 DOWN 1 1 600 <<= first occurrence
Upvotes: 0
Views: 547
Reputation: 75080
Here is an approach, not sure if this is the best way and might be possible to be optimized(comments inline)
#gets open value per the condition
open_val = df.loc[(df['VWB'] == 0) & (df['UOD'] == "DOWN"),'Open']
#check where open value > df['Close'] and create groups
c = df['Close'].gt(open_val.reindex(df.index,method='ffill'))
a = np.digitize(df.index,open_val.index)
#get first index in each group and set the Valid column
valid_idx = c.groupby(a).idxmax()
df['Valid'] = c.loc[valid_idx].reindex(df.index,fill_value=False).astype(int)
#calculate time difference and mask where consition matches
TD = (df['Date_Time'] -
df.loc[open_val.index,'Date_Time'].reindex(df.index,method='ffill')).dt.total_seconds()
df['TD'] = TD.where(df['Valid'].eq(1))
print(df[['Date_Time','Open','Close','UOD','VWB','Valid','TD']])
Date_Time Open Close UOD VWB Valid TD
20 2020-07-01 10:30:00 10298.85 10299.90 UP 3 0 NaN
21 2020-07-01 10:35:00 10301.40 10305.75 UP 3 0 NaN
22 2020-07-01 10:40:00 10305.75 10290.00 DOWN 3 0 NaN
24 2020-07-01 10:45:00 10290.00 10282.65 DOWN 0 0 NaN
25 2020-07-01 10:50:00 10282.30 10282.00 DOWN 3 0 NaN
26 2020-07-01 10:55:00 10280.10 10291.50 UP 3 1 600.0
27 2020-07-01 11:00:00 10290.00 10297.55 UP 3 0 NaN
28 2020-07-01 11:05:00 10296.70 10299.40 UP 3 0 NaN
29 2020-07-01 11:10:00 10299.95 10292.00 DOWN 0 0 NaN
30 2020-07-01 11:15:00 10293.05 10291.55 DOWN 3 0 NaN
31 2020-07-01 11:20:00 10292.00 10351.45 DOWN 1 1 600.0
Upvotes: 2
Reputation: 491
Sample Df
Open Close UOD VWB
0 10298.85 10287.85 UP 3
1 10301.40 10299.15 UP 3
2 10305.75 10290.00 DOWN 3
3 10290.00 10282.65 DOWN 0
4 10282.30 10282.00 DOWN 3
5 10280.10 10291.50 UP 3
Code
a = df.index[(df["UOD"] == "DOWN") & (df["VWB"] == 0)].tolist() ##It will return all index which satisfy the above 1st condition
print(a)
output
[3]
Code
loc = []
for i in a:
for j in range(i+1, len(df)):
if df.iloc[i]["Open"] < df.iloc[j]["Close"]:
loc.append(j)
break
valid = [1 if i in loc else 0 for i in range(0, len(df))]
df["Valid"] = valid
Final Output:
Open Close UOD VWB Valid
0 10298.85 10287.85 UP 3 0
1 10301.40 10299.15 UP 3 0
2 10305.75 10290.00 DOWN 3 0
3 10290.00 10282.65 DOWN 0 0
4 10282.30 10282.00 DOWN 3 0
5 10280.10 10291.50 UP 3 1
Upvotes: 1