Reputation: 201
I want to merge all blank rows such as 4,5,6 into 3 and 8,9,10 into 7 and so on...
I tried this but getting wrong answer.
dataframe=dataframe.replace(r'^\s*$', np.nan, regex=True)
dataframe=dataframe[:-1].fillna(method='ffill')
final_data = dataframe[dataframe['Value Date'].notna()]
final_data = final_data.fillna(0)
x = final_data.groupby(['Booking','Debit','Credit', 'Balance'])['Text'].apply(','.join)
x = x.reset_index()
Help me with it.
Upvotes: 0
Views: 1476
Reputation: 8229
My solution starts as @Manakin, assuming the dict is called data
:
Timestamp = pd.Timestamp
NaT = pd.NaT
df = pd.DataFrame(data)
and then:
(df.groupby((df['Balance'] != '').cumsum())
.agg(dict(dict.fromkeys(df,'first'), **{'Booking Text':', '.join}))
)
produces
Balance Booking Date Txn Date Booking Text Value Date Debit Credit Balance
--------- ------------------- ------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------- ---------- -------------- --------------
1 2018-03-31 00:00:00 2018-03-31 00:00:00 Initial Balance NaT 0.00
2 2018-01-04 00:00:00 2018-01-04 00:00:00 VALUE DATED BALANCE BROUGHT FORWARD 2018-01-04 00:00:00 180,431,640.00 180,431,640.00
3 2018-06-04 00:00:00 2018-06-04 00:00:00 INTEREST-FIXED TERM LOAN, Contract No: 3001-AA18091ZN72C|Interest rate:, 0.810000%|Capital: 1,000,000,000.00|Period: 16.03.2018 -, 06.04.2018|Days: 21/360 2018-06-04 00:00:00 472,500.00 179,959,140.00
4 2018-06-04 00:00:00 2018-06-04 00:00:00 INTEREST-FIXED TERM LOAN, Contract No: 3001-AA1809166QNF|Interest rate: 2018-06-04 00:00:00 315,000.00 179,644,140.00
B anking, 0.810000%|Capital: 1,000,000,000.00|Period: 23.03.2018 -, 06.04.2018|Days: 14/360
5 2018-06-04 00:00:00 2018-06-04 00:00:00 FOREX SPOT, EUR/JPY 130.7271 2018-06-04 00:00:00 472,500.00 180,116,640.00
6 2018-06-04 00:00:00 2018-06-04 00:00:00 FOREX SPOT, EUR/JPY 130.7021 2018-06-04 00:00:00 315,000.00 180,431,640.00
7 2018-09-04 00:00:00 2018-09-04 00:00:00 INTEREST-FIXED TERM LOAN, Contract No: 3001-AA18091NTT8N|Interest rate:, 0.810000%|Capital: 500,000,000.00|Period: 26.03.2018 -, 09.04.2018|Days: 14/360, , 2018-09-04 00:00:00 157,500.00 180,274,140.00
8 2018-09-04 00:00:00 2018-09-04 00:00:00 FOREX SPOT 2018-09-04 00:00:00 157,500.00 180,431,640.00
EUR/JPY 131.1407
9 2018-10-04 00:00:00 2018-10-04 00:00:00 INTEREST-FIXED TERM LOAN, , Contract No: 3001-AA18091NL9XG|Interest rate:, 0.810000%|Capital: 500,000,000.00|Period: 27.03.2018 - 2018-10-04 00:00:00 157,500.00 180,274,140.00
10.04.2018|Days: 14/360
Here we groupby rows that have 'Balance' empty to the row above, and then apply first
aggregation to all columns except to 'Booking Text'
where we concatenate strings together
Upvotes: 2
Reputation: 23099
From your dict, assuming it's called d
we can create a new key
to groupby and join the string columns.
import pandas as pd
Timestamp = pd.Timestamp, NaT = pd.NaT
df = pd.DataFrame.from_dict(d)
df['key'] = (df['Booking Date'].isnull() & df['Booking Date'].shift(1).isnull()==False).astype(int).cumsum()
s = df.groupby('key')['Booking Text'].agg(' '.join)
s.index = s.index - 1
df1 = df.dropna(subset=['Booking Date']).copy()
df1['Booking Text'] = df1['Booking Text'] + ' ' + df1.pop('key').map(s)
print(df1)
Booking Date Txn Date Booking Text \
1 2018-03-31 2018-03-31 Initial Balance VALUE DATED BALANCE BROUGHT FO...
2 2018-01-04 2018-01-04 VALUE DATED BALANCE BROUGHT FORWARD INTEREST-F...
3 2018-06-04 2018-06-04 INTEREST-FIXED TERM LOAN Contract No: 3001-AA1...
7 2018-06-04 2018-06-04 INTEREST-FIXED TERM LOAN Contract No: 3001-AA1...
11 2018-06-04 2018-06-04 FOREX SPOT EUR/JPY 130.7271
13 2018-06-04 2018-06-04 FOREX SPOT EUR/JPY 130.7021
15 2018-09-04 2018-09-04 INTEREST-FIXED TERM LOAN Contract No: 3001-AA1...
21 2018-09-04 2018-09-04 FOREX SPOT\nEUR/JPY 131.1407 INTEREST-FIXED TE...
22 2018-10-04 2018-10-04 INTEREST-FIXED TERM LOAN Contract No: 3001-AA...
Value Date Debit Credit Balance
1 NaT 0.00
2 2018-01-04 180,431,640.00 180,431,640.00
3 2018-06-04 472,500.00 179,959,140.00
7 2018-06-04 315,000.00 179,644,140.00
11 2018-06-04 472,500.00 180,116,640.00
13 2018-06-04 315,000.00 180,431,640.00
15 2018-09-04 157,500.00 180,274,140.00
21 2018-09-04 157,500.00 180,431,640.00
22 2018-10-04 157,500.00 180,274,140.00
Upvotes: 1