Pandas dataframes - join on similar timestamps

Question

I have 2 dataframes,

small_df = 
   time_early            
0, 18:19:20.877154
1, 20:34:24.738802

and large_df, with many more rows

   time_late      
0, 11:12:23.879154
1, 11:12:23.879154            
2, 18:19:20.879154
3, 19:01:20.877154
4, 20:34:24.748802

I want to join them in such a way that every row in small_df is joined to a row in large_df that comes immediately after it, so that the desired result looks something like

   time_early           time_late 
0, 18:19:20.877154      18:19:20.879154
1, 20:34:24.738802      20:34:24.748802

Also, assume that these 2 dataframes may have other columns that I would like to maintain in the final result. How do I achieve this? I know I need some kind of merge, but not exactly sure.

Nader Hisham · Accepted Answer

def join_closest_time(df):
    # first of all get values that is greater than time_early for each row
    time_greater = large_df.time_late > df['time_early']
    # subset data to get only the first one , this should be the closest one
    # to time early if time_late columns is sorted in ascending order
    close_date = large_df[time_greater].iloc[0]
    # then concatenate rows from both data frames
    df_final = pd.concat([df , close_date])
    return df_final

small_df.apply(join_closest_time, axis = 1)


Out[116]:
    time_early          time_late
0   18:19:20.877154 18:19:20.879154
1   20:34:24.738802 20:34:24.748802

if your large_df is not sorted by time_late you've to sort it first in ascending order

large_df.sort_index(by = 'time_late' , inplace=True)

Pandas dataframes - join on similar timestamps

Answers (2)

Related Questions