Matthieu Beaudet
Matthieu Beaudet

Reputation: 11

How to do partial dataframe transposition in pandas

I have a dataframe like this:

+---------+--------------+-------+-------------+
| Customer|    invoice   | Amount| invoice type|
+---------+--------------+-------+-------------+
| Frank   |  invoice 1 n°|   150 |    type1    |
| Frank   |  invoice 2 n°|   300 |    type2    |
| Peter   |  invoice 1 n°|   450 |    type2    |
+---------+--------------+-------+-------------+

In which each line corresponds to an invoice

And I need to transform it into this:

+---------+-------------+--------+--------------+------------+--------+--------------+
| Customer|  invoice1   | Amount1| invoice1 type|  invoice2  | Amount2| invoice2 type|
+---------+-------------+--------+--------------+------------+--------+--------------+
| Frank   |  invoice1 n°|   150  |    type1     | invoice2 n°|  300   |     type2    |
| Peter   |  invoice1 n°|   450  |    type2     |            |        |              |
+---------+-------------+--------+--------------+------------+--------+--------------+

In which each line corresponds to a customer and the different invoices information are transposed to columns. If a client has more invoices than there are columns, new columns need to be created to include all the invoice information.

I see how to do a simple transposition in pandas, but I have absolutely no idea how I could do this easily.

Any help would be very appreciated

Upvotes: 1

Views: 199

Answers (1)

r-beginners
r-beginners

Reputation: 35275

To find the number of columns to expand horizontally, we find the cumulative value in ID. The number of columns is transformed with the pivot_table(). Then, it updates the table with the new column name.

df['flg'] = 1
df['flg'] = df.groupby('Customer')['flg'].transform(pd.Series.cumsum)
df2 = pd.pivot_table(df, index=['Customer'], values=['invoice','Amountinvoice','type'], columns=['flg'], fill_value='', aggfunc=lambda x: x)
new_cols = ['{}_{}'.format(x,y) for x,y in df2.columns]
df2.columns = new_cols
df2.reset_index(inplace=True)
df2 = df2.loc[:,['Customer','invoice_1','Amountinvoice_1','type_1','invoice_2','Amountinvoice_2','type_2']]

df2
|    | Customer   | invoice_1    |   Amountinvoice_1 | type_1   | invoice_2    | Amountinvoice_2   | type_2   |
|---:|:-----------|:-------------|------------------:|:---------|:-------------|:------------------|:---------|
|  0 | Frank      | invoice 1 n° |               150 | type1    | invoice 2 n° | 300.0             | type2    |
|  1 | Peter      | invoice 1 n° |               450 | type2    |              |                   |          |

Upvotes: 1

Related Questions