Reputation: 222
I have a DF like so:
asset_id source_id open_px close_px start_bin end_bin
0 1 a None 10 2022-01-01 09:30:00 2022-01-01 10:00:00
1 1 a None 10 2022-01-01 10:00:00 2022-01-01 10:30:00
2 2 a None 101 2022-01-01 09:30:00 2022-01-01 10:00:00
3 2 a None 500 2022-01-01 10:00:00 2022-01-01 10:30:00
4 2 a None 600 2022-01-01 10:30:00 2022-01-01 11:00:00
code to generate:
rows=[
[1, 'a', None, 10, datetime.datetime(2022, 1, 1, 9, 30), datetime.datetime(2022, 1, 1, 10, 0)],
[1, 'a', None, 10, datetime.datetime(2022, 1, 1, 10, 0), datetime.datetime(2022, 1, 1, 10, 30)],
[2, 'a', None, 101, datetime.datetime(2022, 1, 1, 9, 30), datetime.datetime(2022, 1, 1, 10, 0)],
[2, 'a', None, 500, datetime.datetime(2022, 1, 1, 10, 0), datetime.datetime(2022, 1, 1, 10, 30)],
[2, 'a', None, 600, datetime.datetime(2022, 1, 1, 10, 30), datetime.datetime(2022, 1, 1, 11, 0)]
]
cols = ['asset_id', 'source_id', 'open_px', 'close_px', 'start_bin', 'end_bin']
df = pd.DataFrame(data=rows, columns=cols)
I want to get the open_px
by getting the last close from the bin corresponding to this rows start_bin, but also grouped by asset_id, in the most pandas friendly way. (happy for the first entry in each to remain None
). I do not want to brute force with a loop as the dataset is quite large.
Expected Output:
asset_id source_id open_px close_px start_bin end_bin
0 1 a None 10 2022-01-01 09:30:00 2022-01-01 10:00:00
1 1 a 10 10 2022-01-01 10:00:00 2022-01-01 10:30:00
2 2 a None 101 2022-01-01 09:30:00 2022-01-01 10:00:00
3 2 a 101 500 2022-01-01 10:00:00 2022-01-01 10:30:00
4 2 a 500 600 2022-01-01 10:30:00 2022-01-01 11:00:00
Upvotes: 0
Views: 98
Reputation: 3720
df.sort_values(['asset_id','start_bin'], inplace=True)
df['open_px'] = df['close_px'].shift()
df.loc[~df['asset_id'].duplicated(),'open_px'] = None
print(df)
asset_id source_id open_px close_px start_bin end_bin
0 1 a NaN 10 2022-01-01 09:30:00 2022-01-01 10:00:00
1 1 a 10.0 10 2022-01-01 10:00:00 2022-01-01 10:30:00
2 2 a NaN 101 2022-01-01 09:30:00 2022-01-01 10:00:00
3 2 a 101.0 500 2022-01-01 10:00:00 2022-01-01 10:30:00
4 2 a 500.0 600 2022-01-01 10:30:00 2022-01-01 11:00:00
Upvotes: 1