Reputation: 80
So I have two csv bank statement files that look similar to this and I am trying to make a graph of my total balance over time:
Bank A Bank B
Date Amount Date Amount
24/09 240
25/09 1400 23/09 200
23/09 1250 22/09 250
20/09 1235 21/09 200
The problem is that if I simply merge them together and graph them; if there is only balance information for one account on a particular day then my balance would look like only this much. Eg. on the 22/09 it would show £250 (bank B) when its really £250(bank b)+£1235(bank a from the day before which is money I still have) = £1485.
Please let me know if there is a better solution but it seems that I need to insert blank rows into each dataset to look like this (for both):
Bank A
Date Amount
25/09 1400
24/09 (null)
23/09 1250
22/09 (null)
21/09 (null)
20/09 1235
...And then fill in the balance from the last time one was shown:
Bank A
Date Amount
25/09 1400
24/09 1250
23/09 1250
22/09 1235
21/09 1235
20/09 1235
Then the same for bank B:
Bank B
Date Amount
25/09 null
24/09 240
23/09 200
22/09 250
21/09 200
Becomes:
Bank B
Date Amount
25/09 240
24/09 240
23/09 200
22/09 250
21/09 200
Then I can sum the columns across each file and get a total balance on each day like this:
Both Banks
Date Amount
25/09 1640
24/09 1490
23/09 1450
22/09 1485
21/09 1235
Thanks I hope this is clear enough (and clearer after edit.
Upvotes: 0
Views: 69
Reputation: 323276
By using pd.concat
and bfill
BB.Amount=np.nan
df=pd.concat([BA,BB.loc[~BB.Date.isin(BA.Date),:]],axis=0)
df.reset_index(drop=True).sort_values('Date',ascending=False).bfill()
then, You got your dataframe
Bank A
Out[1240]:
Date Amount
0 25/09 1100.0
1 25/09 1400.0
4 24/09 1250.0
2 23/09 1250.0
5 22/09 1235.0
6 21/09 1235.0
3 20/09 1235.0
Edit :
dfA=dfA.set_index('Date')
dfB=dfB.set_index('Date')
dfA.groupby(level=0).sum()# add group by here
dfB.groupby(level=0).sum()
idx=list(set(dfA.index).union(dfB.index))
dfA=dfA.reindex(idx).sort_index(ascending=False).bfill()
dfB=dfB.reindex(idx).sort_index(ascending=False).bfill()
dfA+dfB
Out[25]:
Amount
Date
25/09 1640.0
24/09 1490.0
23/09 1450.0
22/09 1485.0
21/09 1435.0
20/09 NaN
Upvotes: 2
Reputation: 38415
You can merge the two dataframes on date and then sum the amount on axis 1
df = dfA.merge(dfB, on = 'Date', how = 'outer')
df['Amount'] = df.sum(1)
Now you can groupby date to sum across dates
df.groupby('Date').Amount.sum().reset_index()
You get
Date Amount
0 20/09 1235.0
1 21/09 200.0
2 22/09 250.0
3 23/09 1450.0
4 24/09 240.0
5 25/09 2500.0
EDIT: Borrowing @wen's idx= ..
dfB.set_index('Date', inplace = True)
idx = list(set(dfA.index).union(dfB.index))
dfA = dfA.reindex(idx).sort_index().reset_index()
dfA['Amount'].ffill(inplace = True)
dfB = dfB.reindex(idx).sort_index().reset_index()
dfB['Amount'].ffill(inplace = True)
Now you merge and find sum
dfA.merge(dfB, on = 'Date',how = 'outer')
df['Amount'] = df.sum(1)
You get
Date Amount_x Amount_y Amount
0 20/09 1235.0 NaN 1235.0
1 21/09 1235.0 200.0 1435.0
2 22/09 1235.0 250.0 1485.0
3 23/09 1250.0 200.0 1450.0
4 24/09 1250.0 240.0 1490.0
5 25/09 1400.0 240.0 1640.0
Upvotes: 1