SwissRef
SwissRef

Reputation: 33

Python: Conversion of data into timeseries

I have some data (trade data) in a dataframe that I need to summarise on a daily basis.

The data is connected to flows, so I have lots of rows, which have the following data points that are important for this (there are many more so not reproducing here)

Volume, Start Date, End Date

So for example:

Volume, Start Date, End Date   
100, 1 April 2019, 30 June 2019   
50, 1 Jan 2019, 30 June 2019 

All these rows can have overlapping start and end dates.

What I am looking for is a simple way to create an output with: (using the 2 trades above as example data) Day, Total Volume

So for example:

1 Jan, 50  
2 Jan, 50  
....  
1 April, 150  
2 April, 150  
...  
29 June, 150  
30 June, 150  
1 July, 0   
...  
31 Dec, 0  

I could iterate through each row, and then through each trade. But with so many, that seems a long slow process.

I am using Jupyter notebooks, with Python 3.

Any help appreciated - even if it is pointing me towards a solution I might have missed in my search here...

Upvotes: 1

Views: 41

Answers (1)

SwissRef
SwissRef

Reputation: 33

I have solved this question via filtering.

def volumebyday(date, df, start, end, col):
#date = date to sum, df = dataframe, start/end = name of start/end date column, col = column to sum
    trade1 = df[start] <= date
    trade2 = df[end] >= date
    return df[col][trade1 & trade2].sum(axis=0)

then using this via a lambda function

df = df.apply(lambda row: volumebyday(row["date"], trades, "start_date","end_date","qty")

Upvotes: 1

Related Questions