WolVes
WolVes

Reputation: 1336

Conditional Pandas Concat

I have two dataframes, one which is large (+million rows with several hundred columns) and one which is small(10s of columns and several hundred rows). They have data similar to this:

Dataframe 1:

ID    val1    val2    Time1
234   0.32       1      306
234   0.93       0      309
235   1.02       1      305

Dataframe 2:

ID    val3    val4    Time2
234   9621     234      302
234     12       0      308
235    500       9      305

I need to concatenate the values by ID constrained to the condition that the value of Time2 must be greater than Time1 and that the row chosen is the Min(Time2 - Time1) of all possible options.

Such that the final output look something like:

ID    val1    val2    Time1    val3    val4    Time2 
234   0.32       1      306     12       0      308
234   0.93       0      309    Nan     Nan      Nan
235   1.02       1      305    500       9      305

What is the fastest method to perform this operation? To my knowledge the base concatenate options cannot perform this and row iteration is just painfully slow.

Upvotes: 0

Views: 131

Answers (1)

BENY
BENY

Reputation: 323396

You can using merge_asof

pd.merge_asof(df1.sort_values('Time1'),df2.sort_values('Time2'),left_on='Time1',right_on='Time2',by='ID',direction='forward')
Out[677]: 
    ID  val1  val2  Time1   val3  val4  Time2
0  235  1.02     1    305  500.0   9.0  305.0
1  234  0.32     1    306   12.0   0.0  308.0
2  234  0.93     0    309    NaN   NaN    NaN

Upvotes: 1

Related Questions