Reputation: 1336
I have two dataframes, one which is large (+million rows with several hundred columns) and one which is small(10s of columns and several hundred rows). They have data similar to this:
Dataframe 1:
ID val1 val2 Time1
234 0.32 1 306
234 0.93 0 309
235 1.02 1 305
Dataframe 2:
ID val3 val4 Time2
234 9621 234 302
234 12 0 308
235 500 9 305
I need to concatenate the values by ID constrained to the condition that the value of Time2
must be greater than Time1
and that the row chosen is the Min(Time2 - Time1)
of all possible options.
Such that the final output look something like:
ID val1 val2 Time1 val3 val4 Time2
234 0.32 1 306 12 0 308
234 0.93 0 309 Nan Nan Nan
235 1.02 1 305 500 9 305
What is the fastest method to perform this operation? To my knowledge the base concatenate options cannot perform this and row iteration is just painfully slow.
Upvotes: 0
Views: 131
Reputation: 323396
You can using merge_asof
pd.merge_asof(df1.sort_values('Time1'),df2.sort_values('Time2'),left_on='Time1',right_on='Time2',by='ID',direction='forward')
Out[677]:
ID val1 val2 Time1 val3 val4 Time2
0 235 1.02 1 305 500.0 9.0 305.0
1 234 0.32 1 306 12.0 0.0 308.0
2 234 0.93 0 309 NaN NaN NaN
Upvotes: 1