ceharep
ceharep

Reputation: 439

Pandas pivot tables - different prefixes for different values in new resulting table?

I'm trying to convert a table using a pivot to get columns for Sales and Profit for each Year. I want to be able to prefix the new columns appropriately, i.e. sales_(Year), profit_(Year)?

Is there a way of adding different prefixes in pandas for the different values?

Example:
d = {'SourceID': [1, 1, 2, 2, 3, 3, 3], 'Year': [0, 1, 0, 1, 1, 2, 3], 'Sales': [100, 200, 300, 400 , 500, 600, 700], 'Profit': [10, 20, 30, 40, 50, 60, 70]}
df = pd.DataFrame(data=d)

I can get the information into the appropriate structure using the following:

result = (
  df
  .pivot_table(
    index=['SourceID'],
    columns=['Year'],
    values=['Sales', 'Profit'],
    fill_value=0,
    aggfunc='mean'
  )
  .add_prefix('sales_')
  .reset_index()
)

However I can't work out how to add the prefixes separately for Sales and Profit. At the moment I'm stuck with just sales_ for everything.

Current Output:
|      | ('SourceID', '') | ('sales_Profit', 'sales_0') | ('sales_Profit', 'sales_1') | ('sales_Profit', 'sales_2') | ('sales_Profit', 'sales_3') | ('sales_Sales', 'sales_0') | ('sales_Sales', 'sales_1') | ('sales_Sales', 'sales_2') | ('sales_Sales', 'sales_3') |
| ---: | ---------------: | --------------------------: | --------------------------: | --------------------------: | --------------------------: | -------------------------: | -------------------------: | -------------------------: | -------------------------: |
|    0 |                1 |                          10 |                          20 |                           0 |                           0 |                        100 |                        200 |                          0 |                          0 |
|    1 |                2 |                          30 |                          40 |                           0 |                           0 |                        300 |                        400 |                          0 |                          0 |
|    2 |                3 |                           0 |                          50 |                          60 |                          70 |                          0 |                        500 |                        600 |                        700 |
Desired Output:
|      | ('SourceID', '') | ('Profit', 'profit_0') | ('Profit', 'profit_1') | ('Profit', 'profit_2') | ('Profit', 'profit_3') | ('Sales', 'sales_0') | ('Sales', 'sales_1') | ('Sales', 'sales_2') | ('Sales', 'sales_3') |
| ---: | ---------------: | ---------------------: | --------------------: | ---------------------: | ---------------------: | -------------------: | -------------------: | -------------------: | -------------------: |
|    0 |                1 |                     10 |                    20 |                      0 |                      0 |                  100 |                  200 |                    0 |                    0 |
|    1 |                2 |                     30 |                    40 |                      0 |                      0 |                  300 |                  400 |                    0 |                    0 |
|    2 |                3 |                      0 |                    50 |                     60 |                     70 |                    0 |                  500 |                  600 |                  700 |

###### Original Table

|      | SourceID | Year | Sales | Profit |
| ---: | -------: | ---: | ----: | -----: |
|    0 |        1 |    0 |   100 |     10 |
|    1 |        1 |    1 |   200 |     20 |
|    2 |        2 |    0 |   300 |     30 |
|    3 |        2 |    1 |   400 |     40 |
|    4 |        3 |    1 |   500 |     50 |
|    5 |        3 |    2 |   600 |     60 |
|    6 |        3 |    3 |   700 |     70 |

Upvotes: 3

Views: 620

Answers (1)

jezrael
jezrael

Reputation: 862661

Use list comprehension with f-strings for lowercase first level values and added to second level:

result = (
  df
  .pivot_table(
    index=['SourceID'],
    columns=['Year'],
    values=['Sales', 'Profit'],
    fill_value=0,
    aggfunc='mean'
  ))

L = [(a, f'{a.lower()}{b}') for a,b  in result.columns]
result.columns = pd.MultiIndex.from_tuples(L)
result = result.reset_index()
print (result)
  SourceID  Profit                          Sales                     
           profit0 profit1 profit2 profit3 sales0 sales1 sales2 sales3
0        1      10      20       0       0    100    200      0      0
1        2      30      40       0       0    300    400      0      0
2        3       0      50      60      70      0    500    600    700

Upvotes: 1

Related Questions