Reputation: 33
I have some data (trade data) in a dataframe that I need to summarise on a daily basis.
The data is connected to flows, so I have lots of rows, which have the following data points that are important for this (there are many more so not reproducing here)
Volume, Start Date, End Date
So for example:
Volume, Start Date, End Date
100, 1 April 2019, 30 June 2019
50, 1 Jan 2019, 30 June 2019
All these rows can have overlapping start and end dates.
What I am looking for is a simple way to create an output with: (using the 2 trades above as example data) Day, Total Volume
So for example:
1 Jan, 50
2 Jan, 50
....
1 April, 150
2 April, 150
...
29 June, 150
30 June, 150
1 July, 0
...
31 Dec, 0
I could iterate through each row, and then through each trade. But with so many, that seems a long slow process.
I am using Jupyter notebooks, with Python 3.
Any help appreciated - even if it is pointing me towards a solution I might have missed in my search here...
Upvotes: 1
Views: 41
Reputation: 33
I have solved this question via filtering.
def volumebyday(date, df, start, end, col):
#date = date to sum, df = dataframe, start/end = name of start/end date column, col = column to sum
trade1 = df[start] <= date
trade2 = df[end] >= date
return df[col][trade1 & trade2].sum(axis=0)
then using this via a lambda function
df = df.apply(lambda row: volumebyday(row["date"], trades, "start_date","end_date","qty")
Upvotes: 1