Reputation: 5152
I have a dataframe (lets call it "best_trades") that is such as follow:
exit_time netgain_by_MAE
2000-02-01 07:00:00 2000-02-01 11:30:00 2.083333
2000-02-01 08:30:00 2000-02-01 11:30:00 2.625000
2000-02-01 09:00:00 2000-02-01 11:30:00 3.312500
2000-02-01 09:30:00 2000-02-01 11:30:00 2.000000
2000-02-01 10:30:00 2000-02-01 11:30:00 2.312500
2000-02-02 13:30:00 2000-02-02 15:00:00 2.000000
2000-02-02 14:00:00 2000-02-02 15:00:00 2.428571
2000-02-02 15:00:00 2000-02-03 00:00:00 3.000000
2000-02-03 02:30:00 2000-02-03 07:30:00 2.000000
2000-02-03 10:00:00 2000-02-03 15:00:00 2.222222
2000-02-03 10:30:00 2000-02-04 00:00:00 3.259259
2000-02-03 11:00:00 2000-02-04 00:00:00 4.666667
2000-02-03 11:00:00 2000-02-04 01:00:00 2.629630
2000-02-03 11:00:00 2000-02-04 01:30:00 14.500000
2000-02-03 12:30:00 2000-02-04 01:30:00 11.600000
2000-02-03 13:00:00 2000-02-04 01:30:00 5.300000
2000-02-03 13:30:00 2000-02-04 01:30:00 2.700000
2000-02-03 14:00:00 2000-02-04 03:30:00 4.100000
2000-02-03 14:30:00 2000-02-04 03:30:00 2.600000
The dataframe represent a list of trades. The first column represent the entry time of the trade, the second the exit time, the 3rd column , a custom metric i created.
I would like to compute the highest value of netgain_by_MAE for trades that are overlapping in time, for instance those ones:
2000-02-01 07:00:00 2000-02-01 11:30:00 2.083333
2000-02-01 08:30:00 2000-02-01 11:30:00 2.625000
2000-02-01 09:00:00 2000-02-01 11:30:00 3.312500
2000-02-01 09:30:00 2000-02-01 11:30:00 2.000000
2000-02-01 10:30:00 2000-02-01 11:30:00 2.312500
(here the result would be 2000-02-01 09:00:00 2000-02-01 11:30:00 3.312500
)
I managed to retrieve those rows with the groupby function:
best_trades = best_trades.groupby(['MAE_ts'])['netgain_by_MAE'].transform('max')
However this returns a df such as follow:
2000-02-01 07:00:00 2.083333
2000-02-01 08:30:00 3.312500
2000-02-01 09:00:00 3.312500
2000-02-01 09:30:00 3.312500
2000-02-01 10:30:00 3.312500
2000-02-02 13:30:00 2.428571
...
How to keep the same columns as the original dataframe, only keeping the rows that have the max netgain_by_mae value for each groupby ? (ie i would like to have the following output :
2000-02-01 09:00:00 2000-02-01 11:30:00 3.312500
2000-02-02 14:00:00 2000-02-02 15:00:00 2.428571
2000-02-02 15:00:00 2000-02-03 00:00:00 3.000000
...
Upvotes: 0
Views: 40
Reputation: 61947
If I am interpreting your question correctly then all you must do is sort the DataFrame by exit time and netgain_by_MAE descending and then take the first row of each exit_time group.
df = best_trades.sort_values(['exit_time', 'netgain_by_MAE'], ascending=[True,False])\
.groupby('exit_time')\
.first()\
.reset_index()
print(df[['entry_time', 'exit_time', 'netgain_by_MAE']])
entry_time exit_time netgain_by_MAE
0 2000-02-01 09:00:00 2000-02-01 11:30:00 3.312500
1 2000-02-02 14:00:00 2000-02-02 15:00:00 2.428571
2 2000-02-02 15:00:00 2000-02-03 00:00:00 3.000000
3 2000-02-03 02:30:00 2000-02-03 07:30:00 2.000000
4 2000-02-03 10:00:00 2000-02-03 15:00:00 2.222222
5 2000-02-03 11:00:00 2000-02-04 00:00:00 4.666667
6 2000-02-03 11:00:00 2000-02-04 01:00:00 2.629630
7 2000-02-03 11:00:00 2000-02-04 01:30:00 14.500000
8 2000-02-03 14:00:00 2000-02-04 03:30:00 4.100000
Upvotes: 1