Reputation: 732
I have a dataframe with my order executions on the stock exchange. In most cases, I receive partial fills of my buy order. All order executions in the same second. Mostly.
So I can use groupby to get an overview of my trades. However, sometimes the partial order executions aren't exactly in the same second. And that's where my problem is coming from.
How do i grab all order confirmations in a timespan of 2 minutes, to ignore small differences in time?
This is for example my input
2019-09-17 18:30:46 23.5600 100.0 BOT 2356.0000
2019-09-17 18:30:47 23.5600 82.0 BOT 1931.9200
2019-09-17 18:30:47 23.5600 100.0 BOT 2356.0000
2019-09-17 18:30:47 23.5600 100.0 BOT 2356.0000
2019-09-17 18:30:47 23.5600 100.0 BOT 2356.0000
2019-09-17 18:30:47 23.5600 100.0 BOT 2356.0000
2019-09-17 18:30:47 23.5600 100.0 BOT 2356.0000
2019-09-17 18:30:47 23.5600 100.0 BOT 2356.0000
And I want it to get to look like this below, but that first line on 18:30:46 is creating a mess.
2019-09-17 18:30:47 23.560000 782.0 18423.9200
My code is extremely simple.
tradesum = tradeoverview.groupby(tradeoverview.index).sum()
How could I solve this issue? Maybe I can resample the df to 2-minute time intervals? But then I still run the risk of orders being split exactly on both sides of the interval.
Upvotes: 0
Views: 43
Reputation: 858
You can use resample
method to basically group your data based on its datetime index for however many seconds or minutes (or other units of time). Here I passed '2s'
to sample every 2 seconds, you could pass '1m'
to sample at every minute, etc.
df.resample('2s').agg({'c1': 'max',
'c2': 'sum',
'c3': 'max',
'c4': 'sum'
})
assuming the column names are dt, c1, c2, c3, c4
c1 c2 c3 c4
dt
2019-09-17 18:30:46 23.56 782 BOT 18423.92
Upvotes: 2