Reputation: 4313
I have 2 dataframes,
small_df =
time_early
0, 18:19:20.877154
1, 20:34:24.738802
and large_df
, with many more rows
time_late
0, 11:12:23.879154
1, 11:12:23.879154
2, 18:19:20.879154
3, 19:01:20.877154
4, 20:34:24.748802
I want to join them in such a way that every row in small_df
is joined to a row in large_df
that comes immediately after it, so that the desired result looks something like
time_early time_late
0, 18:19:20.877154 18:19:20.879154
1, 20:34:24.738802 20:34:24.748802
Also, assume that these 2 dataframes may have other columns that I would like to maintain in the final result. How do I achieve this? I know I need some kind of merge, but not exactly sure.
Upvotes: 0
Views: 450
Reputation: 5414
def join_closest_time(df):
# first of all get values that is greater than time_early for each row
time_greater = large_df.time_late > df['time_early']
# subset data to get only the first one , this should be the closest one
# to time early if time_late columns is sorted in ascending order
close_date = large_df[time_greater].iloc[0]
# then concatenate rows from both data frames
df_final = pd.concat([df , close_date])
return df_final
small_df.apply(join_closest_time, axis = 1)
Out[116]:
time_early time_late
0 18:19:20.877154 18:19:20.879154
1 20:34:24.738802 20:34:24.748802
if your large_df
is not sorted by time_late
you've to sort it first in ascending order
large_df.sort_index(by = 'time_late' , inplace=True)
Upvotes: 1
Reputation: 109636
If there is any time_late
following a specific time_early
value, take the first value. Otherwise, use None
.
small_df['time_late'] = \
small_df.time_early.apply(lambda time: large_df[large_df.time_late > time].values[0][0]
if large_df.time_late.gt(time).any() else None)
>>> small_df
time_early time_late
0 18:19:20.877154 18:19:20.879154
1 20:34:24.738802 20:34:24.748802
Upvotes: 0