Which Pandas function do I need? group_by or pivot

Question

I'm still relatively new to Pandas and I can't tell which of the functions I'm best off using to get to my answer. I have looked at pivot, pivot_table, group_by and aggregate but I can't seem to get it to do what I require. Quite possibly user error, for which I apologise!

I have data like this:

Code to create df:

import pandas as pd
df = pd.DataFrame([
    ['1', '1', 'A', 3, 7],
    ['1', '1', 'B', 2, 9],
    ['1', '1', 'C', 2, 9],
    ['1', '2', 'A', 4, 10],
    ['1', '2', 'B', 4, 0],
    ['1', '2', 'C', 9, 8],
    ['2', '1', 'A', 3, 8],
    ['2', '1', 'B', 10, 4],
    ['2', '1', 'C', 0, 1],
    ['2', '2', 'A', 1, 6],
    ['2', '2', 'B', 10, 2],
    ['2', '2', 'C', 10, 3]
], columns = ['Field1', 'Field2', 'Type', 'Price1', 'Price2'])
print(df)

I am trying to get data like this:

Although my end goal will be to end up with one column for A, one for B and one for C. As A will use Price1 and B & C will use Price2.

I don't want to necessarily get the max or min or average or sum of the Price as theoretically (although unlikely) there could be two different Price1's for the same Fields & Type.

What's the best function to use in Pandas to get to what I need?

jezrael · Accepted Answer

Use DataFrame.set_index with DataFrame.unstack for reshape - output is MultiIndex in columns, so added sorting second level by DataFrame.sort_index, flatten values and last create column from Field levels:

df1 = (df.set_index(['Field1','Field2', 'Type'])
         .unstack(fill_value=0)
         .sort_index(axis=1, level=1))
df1.columns = [f'{b}-{a}' for a, b in df1.columns]
df1 = df1.reset_index()
print (df1)
  Field1 Field2  A-Price1  A-Price2  B-Price1  B-Price2  C-Price1  C-Price2
0      1      1         3         7         2         9         2         9
1      1      2         4        10         4         0         9         8
2      2      1         3         8        10         4         0         1
3      2      2         1         6        10         2        10         3

Solution with DataFrame.pivot_table is also possible, but it aggregate values in duplicates first 3 columns with default mean function:

df2 = (df.pivot_table(index=['Field1','Field2'],
                      columns='Type',
                      values=['Price1', 'Price2'],
                      aggfunc='mean')
         .sort_index(axis=1, level=1))
df2.columns = [f'{b}-{a}' for a, b in df2.columns]
df2 = df2.reset_index()
print (df2)

Which Pandas function do I need? group_by or pivot

Answers (2)

Related Questions