thedatasleuth
thedatasleuth

Reputation: 549

Pandas Pivot Table with Repeating Column Categories

I have a dataframe that looks like this:

df = pd.DataFrame({"Object": ['Apple', 'Orange', 'Banana', 'Grape', 'Cherry'], 
                   "Jan 01 Vol": [0, 5, 2, 4, 8],
                  "Jan 01 Price": [1.15, 2.30, 1.75, 3.4, 2.5],
                  "Jan 01 Sales": [0, 11.5, 5.25, 13.6, 20],
                  "Jan 02 Vol": [1, 2, 3, 4, 5],
                  "Jan 02 Price": [1.15, 2.30, 1.75, 3.4, 2.5],
                  "Jan 02 Sales": [1.15, 4.6, 5.25, 13.6, 12.5],
                  "Feb 01 Vol": [5, 4, 3, 2, 1],
                  "Feb 01 Price": [1.15, 2.30, 1.75, 3.4, 2.5],
                  "Feb 01 Sales": [5.75, 9.2, 5.25, 6.8, 2.5],})

I want to be able to manipulate the dataframe such that "Vol", "Price", "Sales" are their own columns while pivoting the date aspect of the column vertically so that it looks like this:

df2 = pd.DataFrame({"Object": ['Apple', 'Apple', 'Apple', 
                               'Orange','Orange', 'Orange', 
                               'Banana', 'Banana', 'Banana', 
                               'Grape', 'Grape', 'Grape', 
                               'Cherry', 'Cherry', 'Cherry'], 
                    "Year": [2001, 2001, 2002, 
                             2001, 2001, 2002, 
                             2001, 2001, 2002, 
                             2001, 2001, 2002, 
                             2001, 2001, 2002],
                   "Month": [1, 2, 1, 
                             1, 2, 1, 
                             1, 2, 1, 
                             1, 2, 1, 
                             1, 2, 1],
                    "Vol": [0, 5, 1, 5, 4, 2, 2, 3, 3, 4, 2, 4, 8, 1, 5],
                   "Price": [1.15, 1.15, 1.15, 2.30, 2.30, 2.30, 1.75, 1.75, 1.75, 3.4, 3.4, 3.4, 2.5, 2.5, 2.5],
                   "Sales": [0, 5.75, 1.15, 11.50, 9.2, 4.6, 5.25, 5.25, 5.25, 13.60, 6.8, 13.60, 20, 2.5, 12.5]})

I thought about doing a lambda function that creates a new column that pulls in the year from the horizontal column names, but that won't work because the array lengths are different. I also thought about doing a pivot table, but again, wasn't sure how to parse out the "Vol", "Price" "Sales" aspects of the columns into their own column. Any help would be appreciated.

Upvotes: 2

Views: 694

Answers (2)

Jondiedoop
Jondiedoop

Reputation: 3353

dfm = df.melt(id_vars='Object')

df3 = pd.concat([dfm[['Object', 'value']], dfm['variable'].str.split(expand=True)], axis=1)
df3.rename(columns={0: 'Month', 1: 'Year', 2:'Type'}, inplace=True)
df3 = df3.set_index(['Object', 'Year', 'Month', 'Type']).unstack()['value'].reset_index()
df3['Year'] = df3['Year'].astype(int)+2000
df3['Month'] = pd.to_datetime(df3['Month'], format='%b').dt.month

#Output
#Type  Object  Year  Month  Price  Sales  Vol
#0      Apple  2001      2   1.15   5.75  5.0
#1      Apple  2001      1   1.15   0.00  0.0
#2      Apple  2002      1   1.15   1.15  1.0
#3     Banana  2001      2   1.75   5.25  3.0
#4     Banana  2001      1   1.75   5.25  2.0
#5     Banana  2002      1   1.75   5.25  3.0
#6     Cherry  2001      2   2.50   2.50  1.0
#7     Cherry  2001      1   2.50  20.00  8.0
#8     Cherry  2002      1   2.50  12.50  5.0
#9      Grape  2001      2   3.40   6.80  2.0
#10     Grape  2001      1   3.40  13.60  4.0
#11     Grape  2002      1   3.40  13.60  4.0
#12    Orange  2001      2   2.30   9.20  4.0
#13    Orange  2001      1   2.30  11.50  5.0
#14    Orange  2002      1   2.30   4.60  2.0

I would first use pd.melt to do the transformations. Use .str.split with expand=True to split the information in the column variable (which was constructed by pd.melt from the columns) into three separate columns and rename them to something meaningful. Then use set_index so we can unstack, which spreads the information over three columns from the long form into wide-form, as desired. Lastly, alter the datetime characteristics into the numbers you desired.

Hope that helps

Upvotes: 3

Scott Boston
Scott Boston

Reputation: 153460

You can use pd.wide_to_long with some column renaming and pd.to_datetime with .dt accessor to get year and month attributes:

df = df.set_index('Object')
df.columns =  df.columns.str.replace(r'(.+) ([Vol|Price|Sales]+)',r'\2_\1')
df_out = pd.wide_to_long(df.reset_index(),['Vol','Price','Sales'],'Object','Months','_','.+')
df_out = df_out.reset_index()
df_out['Months'] = pd.to_datetime(df_out['Months'], format='%b %y')
df_out['Year'] = df_out['Months'].dt.year
df_out['Month'] = df_out['Months'].dt.month
df_out = df_out.drop('Months', axis=1).sort_values(['Object','Year'])
print(df_out)

Output:

    Object  Vol  Price  Sales  Year  Month
0    Apple    0   1.15   0.00  2001      1
10   Apple    5   1.15   5.75  2001      2
5    Apple    1   1.15   1.15  2002      1
2   Banana    2   1.75   5.25  2001      1
12  Banana    3   1.75   5.25  2001      2
7   Banana    3   1.75   5.25  2002      1
4   Cherry    8   2.50  20.00  2001      1
14  Cherry    1   2.50   2.50  2001      2
9   Cherry    5   2.50  12.50  2002      1
3    Grape    4   3.40  13.60  2001      1
13   Grape    2   3.40   6.80  2001      2
8    Grape    4   3.40  13.60  2002      1
1   Orange    5   2.30  11.50  2001      1
11  Orange    4   2.30   9.20  2001      2
6   Orange    2   2.30   4.60  2002      1

Upvotes: 2

Related Questions