Reputation: 1679
I want to merge 2 dataframes (df_a and df_b) on the shared column 'time_stamp'. Both contain floats, however 'time_stamp' in df_a increments by 1, and 'time_stamp' in df_b increments by .33. I want to merge them and aggregate all values of the df_b columns (by max) that fall within df_a time_stamp intervals. Please advise.
df_a
time_stamp data
0.0 b
1.0 b
2.0 c
3.0 a
df_b
time_stamp data_x data_y
0.33 1 0
0.66 0 0
0.99 1 0
1.32 1 3
1.65 1 0
1.98 0 0
2.31 0 0
2.64 0 0
2.97 0 0
This is an example of the desired result:
time_stamp data data_x data_y
0.0 b 1 0
1.0 b 1 3
2.0 c 0 0
3.0 a N/A N/A
Upvotes: 0
Views: 189
Reputation: 704
First define the step that you want.
step = 1
Create a cut, using the step
max_time = df['time_stamp'].max()
#Adding the step to the max_time or it will ignore the last value
cutp = pd.cut(df["time_stamp"], np.arange(0,max_time+step,step))
Filter
df_grouped = df.groupby(cutp).max()
Result
time_stamp data_x data_y
time_stamp
(0.0, 1.0] 0.99 1 0
(1.0, 2.0] 1.98 1 3
(2.0, 3.0] 2.97 0 0
Setting the data frame the way it should.
df_grouped = df_grouped.drop(columns=['time_stamp']).reset_index()
df_grouped['time_stamp'] = df_grouped['time_stamp'].apply(lambda val: val.left)
Result after treatment:
time_stamp data_x data_y
0 0.0 1 0
1 1.0 1 3
2 2.0 0 0
Merging then together
dfa.merge(df_grouped,on='time_stamp',how='left')
Result
time_stamp data data_x data_y
0 0.0 b 1.0 0.0
1 1.0 b 1.0 3.0
2 2.0 c 0.0 0.0
3 3.0 a NaN NaN
This seems kind of overwork but this solution will get the max of both columns data_x and data_y something that merge_asof won't, by my tests it only gets the first column (in the case data_x).
Upvotes: 1