Reputation: 471
Imagine having the following df:
Document type Invoicenumber Invoicedate description quantity unit price line amount
Invoice 123 28-08-2020
0 NaN 17-09-2020 test 1,5 5 20
0 NaN 16-04-2020 test2 1,5 5 20
Invoice 456 02-03-2020
0 NaN NaN test3 21 3 64
0 0 NaN test3 21 3 64
0 0 NaN test3 21 3 64
The rows where there is a 0 are belonging to the row above and are line items of the same document.
My goal is to transpose the line items so that these are on the same line for each invoice as such: I've tried to transpose them based on index but this did not work..
**Document type** **Invoicenumber Invoicedate** description#1 description#2 quantity quantity#2 unit price unit price #2 line amount line amount #2
Invoice 123 28-08-2020 test test2 1,5 1,5 5 5 20 20
and for the second row:
**Document type** **Invoicenumber Invoicedate** description#1 description#2 description #3 quantity quantity#2 quantity #3 unit price unit price #2 unit price #3 line amount line amount #2 line amount #3
Invoice 123 28-08-2020 test3 test3 test3 21 21 21 3 3 3 64 64 64
here is the dictionary code:
df = pd.DataFrame.from_dict({'Document Type': {0: 'IngramMicro.AccountsPayable.Invoice',
1: 0,
2: 0,
3: 'IngramMicro.AccountsPayable.Invoice',
4: 0,
5: 0,
6: 0},
'Factuurnummer': {0: '0.78861803',
1: 'NaN',
2: 'NaN',
3: '202130534',
4: 'NaN',
5: 'NaN',
6: 'NaN'},
'Factuurdatum': {0: '2021-05-03',
1: 'NaN',
2: 'NaN',
3: '2021-09-03',
4: 'NaN',
5: 'NaN',
6: 'NaN'},
'description': {0: 'NaN',
1: 'TM 300 incl onderstel 3058C003 84433210 4549292119381',
2: 'ESP 5Y 36 inch 7950A539 00000000 4960999794266',
3: 'NaN',
4: 'Basistarief A3 Office',
5: 'Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021',
6: 'Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021'},
'quantity': {0: 'NaN', 1: 1.0, 2: 1.0, 3: 'NaN', 4: 1.0, 5: 1.0, 6: 2.0},
'unit price': {0: 'NaN',
1: 1211.63,
2: 742.79,
3: 'NaN',
4: 260.0,
5: 30.0,
6: 30.0},
'line amount': {0: 'NaN',
1: 21.0,
2: 21.0,
3: 'NaN',
4: 260.0,
5: 30.0,
6: 30.0}})
I've tried the following:
df = pd.DataFrame(data=d1)
However failing to accomplish significant results.
Please help !
Upvotes: 2
Views: 57
Reputation: 8219
Here is what you can do. First we enumerate the groups and the line items within each group, and clean up 'Document Type':
import numpy as np
df['g'] = df['Document Type'].ne(0).cumsum()
df['l'] = df.groupby('g').cumcount()
df['Document Type'] = df['Document Type'].replace(0,np.nan).fillna(method = 'ffill')
df
we get
Document Type Factuurnummer Factuurdatum description quantity unit price line amount g l
-- ----------------------------------- --------------- -------------- ------------------------------------------------------------------------ ---------- ------------ ------------- --- ---
0 IngramMicro.AccountsPayable.Invoice 0.788618 2021-05-03 NaN nan nan nan 1 0
1 IngramMicro.AccountsPayable.Invoice nan NaN TM 300 incl onderstel 3058C003 84433210 4549292119381 1 1211.63 21 1 1
2 IngramMicro.AccountsPayable.Invoice nan NaN ESP 5Y 36 inch 7950A539 00000000 4960999794266 1 742.79 21 1 2
3 IngramMicro.AccountsPayable.Invoice 2.02131e+08 2021-09-03 NaN nan nan nan 2 0
4 IngramMicro.AccountsPayable.Invoice nan NaN Basistarief A3 Office 1 260 260 2 1
5 IngramMicro.AccountsPayable.Invoice nan NaN Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021 1 30 30 2 2
6 IngramMicro.AccountsPayable.Invoice nan NaN Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021 2 30 30 2 3
Now we can index on 'g' and 'l' and then move 'l' to columns via unstack
. we drop columns that are all NaNs
df2 = df.set_index(['g','Document Type','l']).unstack(level = 2).replace('NaN',np.nan).dropna(axis='columns', how = 'all')
we rename column labels to be single-level:
df2.columns = [tup[0] + '_' + str(tup[1]) for tup in df2.columns.values]
df2.reset_index().drop(columns = 'g')
and we get something that looks like what you are after, I believe
Document Type Factuurnummer_0 Factuurdatum_0 description_1 description_2 description_3 quantity_1 quantity_2 quantity_3 unit price_1 unit price_2 unit price_3 line amount_1 line amount_2 line amount_3
-- ----------------------------------- ----------------- ---------------- ----------------------------------------------------- ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------ ------------ ------------ -------------- -------------- -------------- --------------- --------------- ---------------
0 IngramMicro.AccountsPayable.Invoice 0.788618 2021-05-03 TM 300 incl onderstel 3058C003 84433210 4549292119381 ESP 5Y 36 inch 7950A539 00000000 4960999794266 nan 1 1 nan 1211.63 742.79 nan 21 21 nan
1 IngramMicro.AccountsPayable.Invoice 2.02131e+08 2021-09-03 Basistarief A3 Office Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021 Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021 1 1 2 260 30 30 260 30 30
Upvotes: 1