Reputation: 1278
I have two panda data frames in python. dataframe1 contains the sample data.
lat long tep height altitude date_time
40.007647 116.319781 0 83 39688.535613 2008-08-20 12:51:17
40.007632 116.319878 0 119 39688.535637 2008-08-20 12:51:19
40.007615 116.319838 0 112 39688.535660 2008-08-20 12:51:21
Dataframe 2 contains the following sample data.
Start_Time End_Time Transportation_Mode
2008-08-20 12:09:17 2008-08-20 12:45:05 walk
2008-08-20 12:45:05 2008-08-20 13:00:25 subway
2008-08-20 13:00:25 2008-08-20 13:07:25 walk
2008-08-20 13:07:25 2008-08-20 13:12:59 bus
2008-08-20 13:13:59 2008-08-20 13:24:23 walk
Dataframe1 should pick transportation mode from dataframe2 based on the condition, if data_time field in dataframe1 lies in between Start_Time
and End_Time
field of dataframe 2. Then pick Transportation_mode from dataframe2 and attach with Tranportation_Mode
in dataframe1.
End result should look like this
lat long tep height altitude date_time Transportation_Mode
40.007647 116.319781 0 83 39688.535613 2008-08-20 12:51:17 subway
40.007632 116.319878 0 119 39688.535637 2008-08-20 12:51:19 subway
40.007615 116.319838 0 112 39688.535660 2008-08-20 12:51:21 subway
Its equivalent sql statement is
select distinct df1.*, df2.Transportation_Mode
from df1,df2
where df1.date_time between df2.Start_Time and df2.End_Time
Upvotes: 0
Views: 88
Reputation: 7058
merge_asof can help you here
pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')
results in:
lat long tep height altitude date_time Start_Time End_Time Transportation_Mode 0 40.007647 116.319781 0 83 39688.535613 2008-08-20 12:51:17 2008-08-20 12:45:05 2008-08-20 13:00:25 subway 1 40.007632 116.319878 0 119 39688.535637 2008-08-20 12:51:19 2008-08-20 12:45:05 2008-08-20 13:00:25 subway 2 40.007615 116.319838 0 112 39688.535660 2008-08-20 12:51:21 2008-08-20 12:45:05 2008-08-20 13:00:25 subway
This only looks at the Start_Time
. If you want to check the End_Time
too you can do something like this:
start = pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')['Transportation_Mode']
end = pd.merge_asof(df1, df2, left_on='date_time', right_on='End_Time', direction='forward')['Transportation_Mode']
pd.concat((df1, start[start == end].reindex(df1.index)), axis=1)
lat long tep height altitude date_time Transportation_Mode 0 40.007647 116.319781 0 83 39688.535613 2008-08-20 12:51:17 subway 1 40.007632 116.319878 0 119 39688.535637 2008-08-20 12:51:19 subway 2 40.007615 116.319838 0 112 39688.535660 2008-08-20 12:51:21 subway
direction
If you can't use pandas > 0.20 You could try this:
start = pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')
transportation_mode = start['Transportation_Mode'].loc[(start['Start_Time'] < start['date_time']) & (start['date_time'] < start['End_Time'])]
pd.concat((df1, transportation_mode), axis=1)
lat long tep height altitude date_time Transportation_Mode 0 40.007647 116.319781 0 83 39688.535613 2008-08-20 12:51:17 subway 1 40.007632 116.319878 0 119 39688.535637 2008-08-20 12:51:19 subway 2 40.007615 116.319838 0 112 39688.535660 2008-08-20 12:51:21 subway 3894 40.088452 116.306029 0 177 39680.677581 2008-08-20 16:15:43 NaN 3895 40.088434 116.306011 0 178 39680.677604 2008-08-20 16:15:45 NaN 3896 40.088423 116.306002 0 179 39680.677627 2008-08-20 16:15:47 NaN 3897 40.088405 116.305990 0 179 39680.677650 2008-08-20 16:15:49 NaN 3898 40.088387 116.305963 0 180 39680.677674 2008-08-20 16:15:51 NaN
Upvotes: 1