Pandas Pivot Table with Repeating Column Categories

Question

I have a dataframe that looks like this:

df = pd.DataFrame({"Object": ['Apple', 'Orange', 'Banana', 'Grape', 'Cherry'], 
                   "Jan 01 Vol": [0, 5, 2, 4, 8],
                  "Jan 01 Price": [1.15, 2.30, 1.75, 3.4, 2.5],
                  "Jan 01 Sales": [0, 11.5, 5.25, 13.6, 20],
                  "Jan 02 Vol": [1, 2, 3, 4, 5],
                  "Jan 02 Price": [1.15, 2.30, 1.75, 3.4, 2.5],
                  "Jan 02 Sales": [1.15, 4.6, 5.25, 13.6, 12.5],
                  "Feb 01 Vol": [5, 4, 3, 2, 1],
                  "Feb 01 Price": [1.15, 2.30, 1.75, 3.4, 2.5],
                  "Feb 01 Sales": [5.75, 9.2, 5.25, 6.8, 2.5],})

I want to be able to manipulate the dataframe such that "Vol", "Price", "Sales" are their own columns while pivoting the date aspect of the column vertically so that it looks like this:

df2 = pd.DataFrame({"Object": ['Apple', 'Apple', 'Apple', 
                               'Orange','Orange', 'Orange', 
                               'Banana', 'Banana', 'Banana', 
                               'Grape', 'Grape', 'Grape', 
                               'Cherry', 'Cherry', 'Cherry'], 
                    "Year": [2001, 2001, 2002, 
                             2001, 2001, 2002, 
                             2001, 2001, 2002, 
                             2001, 2001, 2002, 
                             2001, 2001, 2002],
                   "Month": [1, 2, 1, 
                             1, 2, 1, 
                             1, 2, 1, 
                             1, 2, 1, 
                             1, 2, 1],
                    "Vol": [0, 5, 1, 5, 4, 2, 2, 3, 3, 4, 2, 4, 8, 1, 5],
                   "Price": [1.15, 1.15, 1.15, 2.30, 2.30, 2.30, 1.75, 1.75, 1.75, 3.4, 3.4, 3.4, 2.5, 2.5, 2.5],
                   "Sales": [0, 5.75, 1.15, 11.50, 9.2, 4.6, 5.25, 5.25, 5.25, 13.60, 6.8, 13.60, 20, 2.5, 12.5]})

I thought about doing a lambda function that creates a new column that pulls in the year from the horizontal column names, but that won't work because the array lengths are different. I also thought about doing a pivot table, but again, wasn't sure how to parse out the "Vol", "Price" "Sales" aspects of the columns into their own column. Any help would be appreciated.

Jondiedoop · Accepted Answer

dfm = df.melt(id_vars='Object')

df3 = pd.concat([dfm[['Object', 'value']], dfm['variable'].str.split(expand=True)], axis=1)
df3.rename(columns={0: 'Month', 1: 'Year', 2:'Type'}, inplace=True)
df3 = df3.set_index(['Object', 'Year', 'Month', 'Type']).unstack()['value'].reset_index()
df3['Year'] = df3['Year'].astype(int)+2000
df3['Month'] = pd.to_datetime(df3['Month'], format='%b').dt.month

#Output
#Type  Object  Year  Month  Price  Sales  Vol
#0      Apple  2001      2   1.15   5.75  5.0
#1      Apple  2001      1   1.15   0.00  0.0
#2      Apple  2002      1   1.15   1.15  1.0
#3     Banana  2001      2   1.75   5.25  3.0
#4     Banana  2001      1   1.75   5.25  2.0
#5     Banana  2002      1   1.75   5.25  3.0
#6     Cherry  2001      2   2.50   2.50  1.0
#7     Cherry  2001      1   2.50  20.00  8.0
#8     Cherry  2002      1   2.50  12.50  5.0
#9      Grape  2001      2   3.40   6.80  2.0
#10     Grape  2001      1   3.40  13.60  4.0
#11     Grape  2002      1   3.40  13.60  4.0
#12    Orange  2001      2   2.30   9.20  4.0
#13    Orange  2001      1   2.30  11.50  5.0
#14    Orange  2002      1   2.30   4.60  2.0

I would first use pd.melt to do the transformations. Use .str.split with expand=True to split the information in the column variable (which was constructed by pd.melt from the columns) into three separate columns and rename them to something meaningful. Then use set_index so we can unstack, which spreads the information over three columns from the long form into wide-form, as desired. Lastly, alter the datetime characteristics into the numbers you desired.

Hope that helps

Pandas Pivot Table with Repeating Column Categories

Answers (2)

Related Questions