Max
Max

Reputation: 471

Transposing values in df?

Imagine having the following df:

Document type   Invoicenumber       Invoicedate  description quantity unit price line amount

Invoice             123            28-08-2020
0                   NaN            17-09-2020    test        1,5        5              20
0                   NaN            16-04-2020    test2       1,5        5              20

Invoice             456            02-03-2020
0                   NaN            NaN           test3       21         3              64
0                   0              NaN           test3       21         3              64            
0                   0              NaN           test3       21         3              64 

The rows where there is a 0 are belonging to the row above and are line items of the same document.

My goal is to transpose the line items so that these are on the same line for each invoice as such: I've tried to transpose them based on index but this did not work..

**Document type** **Invoicenumber Invoicedate**  description#1  description#2 quantity quantity#2 unit price   unit price #2 line amount line amount #2 
    
    Invoice             123            28-08-2020    test           test2          1,5      1,5     5            5              20         20

and for the second row:

**Document type** **Invoicenumber Invoicedate**  description#1  description#2 description #3 quantity quantity#2 quantity #3 unit price   unit price #2 unit price #3   line amount line amount #2 line amount #3
    
    Invoice             123            28-08-2020    test3           test3         test3         21      21           21            3              3           3              64         64            64

here is the dictionary code:

df = pd.DataFrame.from_dict({'Document Type': {0: 'IngramMicro.AccountsPayable.Invoice',
  1: 0,
  2: 0,
  3: 'IngramMicro.AccountsPayable.Invoice',
  4: 0,
  5: 0,
  6: 0},
 'Factuurnummer': {0: '0.78861803',
  1: 'NaN',
  2: 'NaN',
  3: '202130534',
  4: 'NaN',
  5: 'NaN',
  6: 'NaN'},
 'Factuurdatum': {0: '2021-05-03',
  1: 'NaN',
  2: 'NaN',
  3: '2021-09-03',
  4: 'NaN',
  5: 'NaN',
  6: 'NaN'},
 'description': {0: 'NaN',
  1: 'TM 300 incl onderstel 3058C003 84433210 4549292119381',
  2: 'ESP 5Y 36 inch 7950A539 00000000 4960999794266',
  3: 'NaN',
  4: 'Basistarief A3 Office',
  5: 'Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021',
  6: 'Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021'},
 'quantity': {0: 'NaN', 1: 1.0, 2: 1.0, 3: 'NaN', 4: 1.0, 5: 1.0, 6: 2.0},
 'unit price': {0: 'NaN',
  1: 1211.63,
  2: 742.79,
  3: 'NaN',
  4: 260.0,
  5: 30.0,
  6: 30.0},
 'line amount': {0: 'NaN',
  1: 21.0,
  2: 21.0,
  3: 'NaN',
  4: 260.0,
  5: 30.0,
  6: 30.0}})

I've tried the following:

df = pd.DataFrame(data=d1)

However failing to accomplish significant results.

Please help !

Upvotes: 2

Views: 57

Answers (1)

piterbarg
piterbarg

Reputation: 8219

Here is what you can do. First we enumerate the groups and the line items within each group, and clean up 'Document Type':

import numpy as np
df['g'] = df['Document Type'].ne(0).cumsum()
df['l'] = df.groupby('g').cumcount()
df['Document Type'] = df['Document Type'].replace(0,np.nan).fillna(method = 'ffill')
df

we get

    Document Type                          Factuurnummer  Factuurdatum    description                                                                 quantity    unit price    line amount    g    l
--  -----------------------------------  ---------------  --------------  ------------------------------------------------------------------------  ----------  ------------  -------------  ---  ---
 0  IngramMicro.AccountsPayable.Invoice      0.788618     2021-05-03      NaN                                                                              nan        nan               nan    1    0
 1  IngramMicro.AccountsPayable.Invoice    nan            NaN             TM 300 incl onderstel 3058C003 84433210 4549292119381                              1       1211.63             21    1    1
 2  IngramMicro.AccountsPayable.Invoice    nan            NaN             ESP 5Y 36 inch 7950A539 00000000 4960999794266                                     1        742.79             21    1    2
 3  IngramMicro.AccountsPayable.Invoice      2.02131e+08  2021-09-03      NaN                                                                              nan        nan               nan    2    0
 4  IngramMicro.AccountsPayable.Invoice    nan            NaN             Basistarief A3 Office                                                              1        260               260    2    1
 5  IngramMicro.AccountsPayable.Invoice    nan            NaN             Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021           1         30                30    2    2
 6  IngramMicro.AccountsPayable.Invoice    nan            NaN             Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021           2         30                30    2    3

Now we can index on 'g' and 'l' and then move 'l' to columns via unstack. we drop columns that are all NaNs

df2 = df.set_index(['g','Document Type','l']).unstack(level = 2).replace('NaN',np.nan).dropna(axis='columns', how = 'all')

we rename column labels to be single-level:

df2.columns = [tup[0] + '_' + str(tup[1]) for tup in df2.columns.values]
df2.reset_index().drop(columns = 'g')

and we get something that looks like what you are after, I believe

    Document Type                          Factuurnummer_0  Factuurdatum_0    description_1                                          description_2                                                             description_3                                                               quantity_1    quantity_2    quantity_3    unit price_1    unit price_2    unit price_3    line amount_1    line amount_2    line amount_3
--  -----------------------------------  -----------------  ----------------  -----------------------------------------------------  ------------------------------------------------------------------------  ------------------------------------------------------------------------  ------------  ------------  ------------  --------------  --------------  --------------  ---------------  ---------------  ---------------
 0  IngramMicro.AccountsPayable.Invoice        0.788618     2021-05-03        TM 300 incl onderstel 3058C003 84433210 4549292119381  ESP 5Y 36 inch 7950A539 00000000 4960999794266                            nan                                                                                  1             1           nan         1211.63          742.79             nan               21               21              nan
 1  IngramMicro.AccountsPayable.Invoice        2.02131e+08  2021-09-03        Basistarief A3 Office                                  Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021  Toeslag 100 km enkele reis Leveren installeren Xerox VL C7020 05-03-2021             1             1             2          260              30                 30              260               30               30

Upvotes: 1

Related Questions