Reputation: 99
I have following table in pandas. the table contains time and the price of the product.
For analysis purposes, I want to have 2 columns which would contain the next time when the product is more than $100 price change in either direction.
e.g. if I am at cell 09:19 cell the next price more than $100 higher would be 14:02 & less than $100 lower would be 11:39 so 14:02 & 11:39 should come in 09:19 row in respective columns.
Same way against cell 09:56, next price more than $100 higher would be 14:02 & less than $100 lower would be 12:18 so these 2 values would come in against the row of 09:56.
Table
Time Price Up_Time Down_Time
09:19:00 3252.25
09:24:00 3259.9
09:56:00 3199.4
10:17:00 3222.5
10:43:00 3191.25
11:39:00 3143
12:18:00 2991.7
13:20:00 3196.35
13:26:00 3176.1
13:34:00 3198.85
13:37:00 3260.75
14:00:00 3160.85
14:02:00 3450
14:19:00 3060.5
14:30:00 2968.7
14:31:00 2895.8
14:52:00 2880.7
14:53:00 2901.55
14:55:00 2885.55
14:57:00 2839.05
14:58:00 2871.5
15:00:00 2718.95
I am using following code, which works but takes 15-20 mins for 1 dataset.
for i, row in df.iterrows():
time_up = np.nan
time_down = np.nan
for j in range(i+1, len(df)):
diff = df.iloc[j]['Price'] - row['Price']
if diff > 100:
time_up = df.iloc[j]['Time']
elif diff < -100:
time_down = df.iloc[j]['Time']
if not pd.isna(time_up) or not pd.isna(time_down):
break
df.at[i, 'Up_Time'] = time_up
df.at[i, 'Down_Time'] = time_down
Is there any more efficient way to do it?
Upvotes: 2
Views: 76
Reputation: 147216
You do need to compare each row's Price
value with all the rows that come after it, so some amount of iteration is necessary. You can do that with apply
and a function using numpy to find the first value which meets the change requirement of >100 or <-100:
def updown(row, df):
rownum = row.name
up = (row['Price'] < df.loc[rownum:, 'Price'] - 100).argmax()
down = (row['Price'] > df.loc[rownum:, 'Price'] + 100).argmax()
return (
df.loc[up + rownum, 'Time'] if up > 0 else pd.NaT,
df.loc[down + rownum, 'Time'] if down > 0 else pd.NaT
)
df[['Up_Time', 'Down_Time']] = df.apply(updown, axis=1, result_type='expand', df=df)
Output:
Time Price Up_Time Down_Time
0 09:19:00 3252.25 14:02:00 11:39:00
1 09:24:00 3259.90 14:02:00 11:39:00
2 09:56:00 3199.40 14:02:00 12:18:00
3 10:17:00 3222.50 14:02:00 12:18:00
4 10:43:00 3191.25 14:02:00 12:18:00
5 11:39:00 3143.00 13:37:00 12:18:00
6 12:18:00 2991.70 13:20:00 14:52:00
7 13:20:00 3196.35 14:02:00 14:19:00
8 13:26:00 3176.10 14:02:00 14:19:00
9 13:34:00 3198.85 14:02:00 14:19:00
10 13:37:00 3260.75 14:02:00 14:19:00
11 14:00:00 3160.85 14:02:00 14:19:00
12 14:02:00 3450.00 NaT 14:19:00
13 14:19:00 3060.50 NaT 14:31:00
14 14:30:00 2968.70 NaT 14:57:00
15 14:31:00 2895.80 NaT 15:00:00
16 14:52:00 2880.70 NaT 15:00:00
17 14:53:00 2901.55 NaT 15:00:00
18 14:55:00 2885.55 NaT 15:00:00
19 14:57:00 2839.05 NaT 15:00:00
20 14:58:00 2871.50 NaT 15:00:00
21 15:00:00 2718.95 NaT NaT
Upvotes: 2