Reputation: 42
I found a way to get the number I need, but I was hoping to get some input on how to accomplish it in a less cumbersome way. I need a running total of transactions to date in order to make it into a plotly plot. The data I have only includes a few columns: id, date, and amount. Here's the code I have so far:
fy20 = pd.read_excel('./data/transactions.xlsx', parse_dates=['date'])
def daily_money(df):
df = df.groupby('date').amount.sum()
df = df.groupby(df.index.day).cumsum()
df = df.cumsum().to_frame().reset_index()
return df
fy20 = daily_money(fy20)
This appears to accomplish the goal, but it seems like there must be a simpler way. Please let me know if you have any suggestions on how to simplify this.
Upvotes: 0
Views: 974
Reputation: 4263
It looks to me like this should work:
df.groupby('date')['amount'].sum().cumsum()
This works because DataFrame.groupby automatically sorts by the group keys, so the cumulative sum is already looking at the data it needs.
If you want it as a DataFrame with a new index instead of a Series, you can just use Series.reset_index
, which converts the series to a DataFrame first, but unless you need the date as a normal column (rather than the index) later, you don't need to do that.
Upvotes: 1