Reputation: 197
I have two dataframes coarse
and fine
. fine
contains 3 columns start_time, end_time, start_price, end_price
. coarse
contains start_time, end_time
. Note that all the times are pandas timestamp objects, e.g., 2016-12-12 01:03:13.15231+00:00
For coarse
I need to add 2 additional columns start_price, end_price
, where coarse.start_price
is the the fine.start_price
that has a fine.start_time
closest to coarse.start_time
, and analogously so for end_price
.
Here's an example for clarity:
coarse:
start_time end_time
2016-12-12 01:00:00.000+00:00 2016-12-12 02:00:00.000+00:00
2016-12-12 02:00:00.000+00:00 2016-12-12 03:00:00.000+00:00
2016-12-12 03:00:00.000+00:00 2016-12-12 03:30:00.000+00:00
fine:
start_time end_time start_price
2016-12-12 00:59:00.000+00:00 2016-12-12 01:12:00.000+00:00 2.3
2016-12-12 01:12:00.000+00:00 2016-12-12 01:15:00.000+00:00 4.5
2016-12-12 01:15:00.000+00:00 2016-12-12 01:45:00.000+00:00 5.7
2016-12-12 01:45:00.000+00:00 2016-12-12 01:55:00.000+00:00 8.8
2016-12-12 01:55:00.000+00:00 2016-12-12 02:15:00.000+00:00 9.9
2016-12-12 02:15:00.000+00:00 2016-12-12 02:16:00.000+00:00 11.2
2016-12-12 02:16:00.000+00:00 2016-12-12 02:31:00.000+00:00 13.5
2016-12-12 02:31:00.000+00:00 2016-12-12 02:45:00.000+00:00 14.8
2016-12-12 02:45:00.000+00:00 2016-12-12 02:59:00.000+00:00 15.9
2016-12-12 02:59:00.000+00:00 2016-12-12 03:31:00.000+00:00 16.0
So the result should be
coarse:
start_time end_time start_price
2016-12-12 01:00:00.000+00:00 2016-12-12 02:00:00.000+00:00 2.3
2016-12-12 02:00:00.000+00:00 2016-12-12 03:00:00.000+00:00 9.9
2016-12-12 03:00:00.000+00:00 2016-12-12 03:30:00.000+00:00 16.0
(I skipped end_price
because it's the same logic). I am wondering what is the easiest way to do this?
Additionally, rather than using the nearest neighbor, I may consider interpolating the price instead based on the times. Is there an easy way to do this as well?
Upvotes: 0
Views: 47
Reputation: 16172
You may be looking for pd.merge_asof which is perfect for joining on times that don't match exactly. There is a direction parameter which you can adjust to cause the match to go the direction you want.
import pandas as pd
coarse = pd.DataFrame({'start_time': ['2016-12-12 01:00:00.000+00:00', '2016-12-12 02:00:00.000+00:00', '2016-12-12 03:00:00.000+00:00'], 'end_time': ['2016-12-12 02:00:00.000+00:00', '2016-12-12 03:00:00.000+00:00', '2016-12-12 03:30:00.000+00:00']} )
fine = pd.DataFrame({'start_time': ['2016-12-12 00:59:00.000+00:00', '2016-12-12 01:12:00.000+00:00', '2016-12-12 01:15:00.000+00:00', '2016-12-12 01:45:00.000+00:00', '2016-12-12 01:55:00.000+00:00', '2016-12-12 02:15:00.000+00:00', '2016-12-12 02:16:00.000+00:00', '2016-12-12 02:31:00.000+00:00', '2016-12-12 02:45:00.000+00:00', '2016-12-12 02:59:00.000+00:00'], 'end_time': ['2016-12-12 01:12:00.000+00:00', '2016-12-12 01:15:00.000+00:00', '2016-12-12 01:45:00.000+00:00', '2016-12-12 01:55:00.000+00:00', '2016-12-12 02:15:00.000+00:00', '2016-12-12 02:16:00.000+00:00', '2016-12-12 02:31:00.000+00:00', '2016-12-12 02:45:00.000+00:00', '2016-12-12 02:59:00.000+00:00', '2016-12-12 03:31:00.000+00:00'], 'start_price': [2.3, 4.5, 5.7, 8.8, 9.9, 11.2, 13.5, 14.8, 15.9, 16.0]})
coarse['start_time'] = pd.to_datetime(coarse['start_time'])
fine['start_time'] = pd.to_datetime(fine['start_time'])
coarse['end_time'] = pd.to_datetime(coarse['end_time'])
fine['end_time'] = pd.to_datetime(fine['end_time'])
coarse = pd.merge_asof(coarse, fine[['start_time','start_price']],
on='start_time',
direction='nearest')
coarse = pd.merge_asof(coarse,
fine[['end_time','start_price']].rename(columns={'start_price':'end_price'}),
on='end_time',
direction='nearest')
Output
start_time end_time start_price end_price
0 2016-12-12 01:00:00+00:00 2016-12-12 02:00:00+00:00 2.3 8.8
1 2016-12-12 02:00:00+00:00 2016-12-12 03:00:00+00:00 9.9 15.9
2 2016-12-12 03:00:00+00:00 2016-12-12 03:30:00+00:00 16.0 16.0
Upvotes: 1