Asif Khan
Asif Khan

Reputation: 1278

Date range comparisons in a pandas dataframe

I have two panda data frames in python. dataframe1 contains the sample data.

      lat        long      tep  height  altitude      date_time        
  40.007647  116.319781    0   83  39688.535613  2008-08-20 12:51:17   
  40.007632  116.319878    0  119  39688.535637  2008-08-20 12:51:19   
  40.007615  116.319838    0  112  39688.535660  2008-08-20 12:51:21   

Dataframe 2 contains the following sample data.

Start_Time                   End_Time               Transportation_Mode
2008-08-20 12:09:17     2008-08-20 12:45:05                walk
2008-08-20 12:45:05     2008-08-20 13:00:25              subway
2008-08-20 13:00:25     2008-08-20 13:07:25                walk
2008-08-20 13:07:25     2008-08-20 13:12:59                 bus
2008-08-20 13:13:59     2008-08-20 13:24:23                walk

Dataframe1 should pick transportation mode from dataframe2 based on the condition, if data_time field in dataframe1 lies in between Start_Time and End_Time field of dataframe 2. Then pick Transportation_mode from dataframe2 and attach with Tranportation_Mode in dataframe1.

End result should look like this

  lat        long      tep  height  altitude      date_time        Transportation_Mode
40.007647  116.319781    0   83  39688.535613  2008-08-20 12:51:17   subway   
40.007632  116.319878    0  119  39688.535637  2008-08-20 12:51:19   subway   
40.007615  116.319838    0  112  39688.535660  2008-08-20 12:51:21   subway   

Its equivalent sql statement is

select distinct df1.*, df2.Transportation_Mode   
from df1,df2   
where df1.date_time between df2.Start_Time and df2.End_Time 

Upvotes: 0

Views: 88

Answers (1)

Maarten Fabré
Maarten Fabré

Reputation: 7058

merge_asof can help you here

pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')

results in:

  lat     long    tep     height  altitude    date_time   Start_Time  End_Time    Transportation_Mode
0     40.007647   116.319781  0   83  39688.535613    2008-08-20 12:51:17     2008-08-20 12:45:05     2008-08-20 13:00:25     subway
1     40.007632   116.319878  0   119     39688.535637    2008-08-20 12:51:19     2008-08-20 12:45:05     2008-08-20 13:00:25     subway
2     40.007615   116.319838  0   112     39688.535660    2008-08-20 12:51:21     2008-08-20 12:45:05     2008-08-20 13:00:25     subway

This only looks at the Start_Time. If you want to check the End_Time too you can do something like this:

start = pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')['Transportation_Mode']
end = pd.merge_asof(df1, df2, left_on='date_time', right_on='End_Time', direction='forward')['Transportation_Mode']

pd.concat((df1, start[start == end].reindex(df1.index)), axis=1)
  lat     long    tep     height  altitude    date_time   Transportation_Mode
0     40.007647   116.319781  0   83  39688.535613    2008-08-20 12:51:17     subway
1     40.007632   116.319878  0   119     39688.535637    2008-08-20 12:51:19     subway
2     40.007615   116.319838  0   112     39688.535660    2008-08-20 12:51:21     subway

without direction

If you can't use pandas > 0.20 You could try this:

start = pd.merge_asof(df1, df2, left_on='date_time', right_on='Start_Time')
transportation_mode = start['Transportation_Mode'].loc[(start['Start_Time'] < start['date_time']) & (start['date_time'] < start['End_Time'])]
pd.concat((df1, transportation_mode), axis=1)
  lat     long    tep     height  altitude    date_time   Transportation_Mode
0     40.007647   116.319781  0   83  39688.535613    2008-08-20 12:51:17     subway
1     40.007632   116.319878  0   119     39688.535637    2008-08-20 12:51:19     subway
2     40.007615   116.319838  0   112     39688.535660    2008-08-20 12:51:21     subway
3894  40.088452   116.306029  0   177     39680.677581    2008-08-20 16:15:43     NaN
3895  40.088434   116.306011  0   178     39680.677604    2008-08-20 16:15:45     NaN
3896  40.088423   116.306002  0   179     39680.677627    2008-08-20 16:15:47     NaN
3897  40.088405   116.305990  0   179     39680.677650    2008-08-20 16:15:49     NaN
3898  40.088387   116.305963  0   180     39680.677674    2008-08-20 16:15:51     NaN

Upvotes: 1

Related Questions