Reputation: 23
I have 2 DataFrames
A_df = pd.DataFrame(data = np.arange(2, 103, 10) + np.random.randn(11),
columns = ['Time(s)'])
B_df = pd.DataFrame(data = zip(range(1, 102), np.random.randn(101)),
columns = ['Time(s)', 'Value'])
A_df.head()
Time(s)
0 2.751352
1 12.028663
2 20.638388
3 29.821199
4 42.516302
B_df.head()
Time(s) Value
0 1 1.075801
1 2 0.890754
2 3 -0.015543
3 4 0.085298
4 5 0.208645
I want to add a new column Value of B at T
to A_df
which is the latest value of B (from B_df
) at time t (in A_df
).
Currently I am doing this by using apply()
method as follows:
A_df['Value of B at T'] = A_df.apply(lambda x: B_df.loc[B_df['Time(s)'] <= x['Time(s)'], 'Value'].values[-1], axis = 1)
A_df
Time(s) Value of B at T
0 2.751352 -0.891782
1 12.028663 2.416335
2 20.638388 -0.186364
3 29.821199 -0.148716
4 42.516302 0.821272
I was wondering if there is a way to vectorize the functionality being carried out by apply()
method to speed up the code?
Upvotes: 1
Views: 61
Reputation: 26261
You can try the following:
A_df = A_df.assign(
b_at_t=B_df
.set_index('Time(s)')
.reindex(A_df['Time(s)'], method='ffill')
.values)
Upvotes: 1