Jack Rickards
Jack Rickards

Reputation: 80

Create blank rows between rows in a dataset and then fill them using rows beneath. Python

So I have two csv bank statement files that look similar to this and I am trying to make a graph of my total balance over time:

Bank A Bank B Date Amount Date Amount 24/09 240 25/09 1400 23/09 200 23/09 1250 22/09 250 20/09 1235 21/09 200

The problem is that if I simply merge them together and graph them; if there is only balance information for one account on a particular day then my balance would look like only this much. Eg. on the 22/09 it would show £250 (bank B) when its really £250(bank b)+£1235(bank a from the day before which is money I still have) = £1485.

Please let me know if there is a better solution but it seems that I need to insert blank rows into each dataset to look like this (for both):

Bank A Date Amount 25/09 1400 24/09 (null) 23/09 1250 22/09 (null) 21/09 (null) 20/09 1235

...And then fill in the balance from the last time one was shown: Bank A Date Amount 25/09 1400 24/09 1250 23/09 1250 22/09 1235 21/09 1235 20/09 1235

Then the same for bank B:

Bank B Date Amount 25/09 null 24/09 240 23/09 200 22/09 250 21/09 200

Becomes:

Bank B Date Amount 25/09 240 24/09 240 23/09 200 22/09 250 21/09 200

Then I can sum the columns across each file and get a total balance on each day like this:

Both Banks Date Amount 25/09 1640 24/09 1490 23/09 1450 22/09 1485 21/09 1235

Thanks I hope this is clear enough (and clearer after edit.

Upvotes: 0

Views: 69

Answers (2)

BENY
BENY

Reputation: 323276

By using pd.concat and bfill

BB.Amount=np.nan
df=pd.concat([BA,BB.loc[~BB.Date.isin(BA.Date),:]],axis=0)
df.reset_index(drop=True).sort_values('Date',ascending=False).bfill()

then, You got your dataframe Bank A

Out[1240]: 
    Date  Amount
0  25/09  1100.0
1  25/09  1400.0
4  24/09  1250.0
2  23/09  1250.0
5  22/09  1235.0
6  21/09  1235.0
3  20/09  1235.0

Edit :

dfA=dfA.set_index('Date')
dfB=dfB.set_index('Date')
dfA.groupby(level=0).sum()# add group by here 
dfB.groupby(level=0).sum()
idx=list(set(dfA.index).union(dfB.index))
dfA=dfA.reindex(idx).sort_index(ascending=False).bfill()
dfB=dfB.reindex(idx).sort_index(ascending=False).bfill()
dfA+dfB

Out[25]: 
       Amount
Date         
25/09  1640.0
24/09  1490.0
23/09  1450.0
22/09  1485.0
21/09  1435.0
20/09     NaN

Upvotes: 2

Vaishali
Vaishali

Reputation: 38415

You can merge the two dataframes on date and then sum the amount on axis 1

df = dfA.merge(dfB, on = 'Date', how = 'outer')
df['Amount'] = df.sum(1)

Now you can groupby date to sum across dates

df.groupby('Date').Amount.sum().reset_index()

You get

    Date    Amount
0   20/09   1235.0
1   21/09   200.0
2   22/09   250.0
3   23/09   1450.0
4   24/09   240.0
5   25/09   2500.0

EDIT: Borrowing @wen's idx= ..

dfB.set_index('Date', inplace = True)
idx = list(set(dfA.index).union(dfB.index))
dfA = dfA.reindex(idx).sort_index().reset_index()
dfA['Amount'].ffill(inplace = True)
dfB = dfB.reindex(idx).sort_index().reset_index()
dfB['Amount'].ffill(inplace = True)

Now you merge and find sum

dfA.merge(dfB, on = 'Date',how = 'outer')
df['Amount'] = df.sum(1)

You get

    Date    Amount_x    Amount_y Amount
0   20/09   1235.0      NaN     1235.0
1   21/09   1235.0      200.0   1435.0
2   22/09   1235.0      250.0   1485.0
3   23/09   1250.0      200.0   1450.0
4   24/09   1250.0      240.0   1490.0
5   25/09   1400.0      240.0   1640.0

Upvotes: 1

Related Questions