rakesh
rakesh

Reputation: 99

Better option than pandas iterrows

I have following table in pandas. the table contains time and the price of the product.

For analysis purposes, I want to have 2 columns which would contain the next time when the product is more than $100 price change in either direction.

e.g. if I am at cell 09:19 cell the next price more than $100 higher would be 14:02 & less than $100 lower would be 11:39 so 14:02 & 11:39 should come in 09:19 row in respective columns.

Same way against cell 09:56, next price more than $100 higher would be 14:02 & less than $100 lower would be 12:18 so these 2 values would come in against the row of 09:56.

Table
Time        Price    Up_Time   Down_Time
09:19:00    3252.25     
09:24:00    3259.9      
09:56:00    3199.4      
10:17:00    3222.5      
10:43:00    3191.25     
11:39:00    3143        
12:18:00    2991.7      
13:20:00    3196.35     
13:26:00    3176.1      
13:34:00    3198.85     
13:37:00    3260.75     
14:00:00    3160.85     
14:02:00    3450        
14:19:00    3060.5      
14:30:00    2968.7      
14:31:00    2895.8      
14:52:00    2880.7      
14:53:00    2901.55     
14:55:00    2885.55     
14:57:00    2839.05     
14:58:00    2871.5      
15:00:00    2718.95     

I am using following code, which works but takes 15-20 mins for 1 dataset.

for i, row in df.iterrows():
    time_up = np.nan
    time_down = np.nan

    for j in range(i+1, len(df)):
        diff = df.iloc[j]['Price'] - row['Price']
        if diff > 100:
            time_up = df.iloc[j]['Time']
        elif diff < -100:
            time_down = df.iloc[j]['Time']

        if not pd.isna(time_up) or not pd.isna(time_down):
            break

    df.at[i, 'Up_Time'] = time_up
    df.at[i, 'Down_Time'] = time_down

Is there any more efficient way to do it?

Upvotes: 2

Views: 76

Answers (1)

Nick
Nick

Reputation: 147216

You do need to compare each row's Price value with all the rows that come after it, so some amount of iteration is necessary. You can do that with apply and a function using numpy to find the first value which meets the change requirement of >100 or <-100:

def updown(row, df):
    rownum = row.name
    up = (row['Price'] < df.loc[rownum:, 'Price'] - 100).argmax()
    down = (row['Price'] > df.loc[rownum:, 'Price'] + 100).argmax()
    return (
        df.loc[up + rownum, 'Time'] if up > 0 else pd.NaT,
        df.loc[down + rownum, 'Time'] if down > 0 else pd.NaT
    )

df[['Up_Time', 'Down_Time']] = df.apply(updown, axis=1, result_type='expand', df=df)

Output:

        Time    Price   Up_Time Down_Time
0   09:19:00  3252.25  14:02:00  11:39:00
1   09:24:00  3259.90  14:02:00  11:39:00
2   09:56:00  3199.40  14:02:00  12:18:00
3   10:17:00  3222.50  14:02:00  12:18:00
4   10:43:00  3191.25  14:02:00  12:18:00
5   11:39:00  3143.00  13:37:00  12:18:00
6   12:18:00  2991.70  13:20:00  14:52:00
7   13:20:00  3196.35  14:02:00  14:19:00
8   13:26:00  3176.10  14:02:00  14:19:00
9   13:34:00  3198.85  14:02:00  14:19:00
10  13:37:00  3260.75  14:02:00  14:19:00
11  14:00:00  3160.85  14:02:00  14:19:00
12  14:02:00  3450.00       NaT  14:19:00
13  14:19:00  3060.50       NaT  14:31:00
14  14:30:00  2968.70       NaT  14:57:00
15  14:31:00  2895.80       NaT  15:00:00
16  14:52:00  2880.70       NaT  15:00:00
17  14:53:00  2901.55       NaT  15:00:00
18  14:55:00  2885.55       NaT  15:00:00
19  14:57:00  2839.05       NaT  15:00:00
20  14:58:00  2871.50       NaT  15:00:00
21  15:00:00  2718.95       NaT       NaT

Upvotes: 2

Related Questions