Pivoting MultiIndex Data

Question

I have a MultiIndex pandas dataframe that looks like this:

I want the different quarters as rows instead of hierarchial columns i.e. a long format instead of this wide one. Something like this (output need not be a multindex):

How can I do this in Pandas?

Edit:

Sample input file as requested:

rbi_credits_data.xlsx

Sample Data(Pandas):

cols = pd.MultiIndex(levels=[['Center_Details', '2017-18:Q2', '2017-18:Q1'],
                       ['State', 'District', 'Center', 'Offices', 'Deposit', 'Credit']],
               labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
                       [0, 1, 2, 3, 4, 5, 3, 4, 5]])
data = [['JAMMU & KASHMIR', 'KUPWARA', 'Drug Mulla (CT)', '3', '500', '600', '4', '500', '600'], 
    ['JAMMU & KASHMIR', 'LEH LADAKH', 'Chuglamsar (CT)', '3', '500', '600', '4', '500', '600'], 
    ['PUNJAB', 'PATHANKOT', 'Mamun (CT)', '3', '500', '600', '4', '500', '600'], 
    ['PUNJAB', 'GURDASPUR', 'TIBRI', '3', '500', '600', '4', '500', '600']]
df = pd.DataFrame(data=data, columns=cols)

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

One approach could be to just flatten the MultiIndex and use melt and pivot_table, something like this:

# Flatten the MultiIndex columns
df.columns = [' '.join(col).strip() for col in df.columns.values]

# Save some typing
idx = ['Center_Details State', 'Center_Details District', 'Center_Details Center']

# Create a long dataframe
long = pd.melt(df, id_vars = idx)

# Split the "variable" column at the space created when flattening the MultiIndex
long['QTR'], long['item'] = zip(*long['variable'].map(lambda x: x.split(' ')))

# Reshape to wide format, keeping "QTR" as a column
out = pd.pivot_table(long, index = idx + ["QTR"], columns = 'item', 
                     values = 'value', aggfunc = 'first').reset_index()
print(out)
item Center_Details State Center_Details District Center_Details Center  \
0         JAMMU & KASHMIR                 KUPWARA       Drug Mulla (CT)   
1         JAMMU & KASHMIR                 KUPWARA       Drug Mulla (CT)   
2         JAMMU & KASHMIR              LEH LADAKH       Chuglamsar (CT)   
3         JAMMU & KASHMIR              LEH LADAKH       Chuglamsar (CT)   
4                  PUNJAB               GURDASPUR                 TIBRI   
5                  PUNJAB               GURDASPUR                 TIBRI   
6                  PUNJAB               PATHANKOT            Mamun (CT)   
7                  PUNJAB               PATHANKOT            Mamun (CT)   

item         QTR Credit Deposit Offices  
0     2017-18:Q1    600     500       4  
1     2017-18:Q2    600     500       3  
2     2017-18:Q1    600     500       4  
3     2017-18:Q2    600     500       3  
4     2017-18:Q1    600     500       4  
5     2017-18:Q2    600     500       3  
6     2017-18:Q1    600     500       4  
7     2017-18:Q2    600     500       3

Another option might be something like:

long = df.set_index(['Center_Details']).stack().T.unstack()
long = pd.concat([pd.DataFrame(long.reset_index()['Center_Details'].tolist()), 
                  long.reset_index()], axis=1)
long.columns = ['State', 'District', 'Center', 'Center_Details', 
                'Items', 'QTR', 'Value']
out = pd.pivot_table(long, index=['State', 'District', 'Center', 'QTR'], 
                     columns='Items', values='Value', 
                     aggfunc='first').reset_index()
print(out)
Items            State    District           Center         QTR Credit  \
0      JAMMU & KASHMIR     KUPWARA  Drug Mulla (CT)  2017-18:Q1    600   
1      JAMMU & KASHMIR     KUPWARA  Drug Mulla (CT)  2017-18:Q2    600   
2      JAMMU & KASHMIR  LEH LADAKH  Chuglamsar (CT)  2017-18:Q1    600   
3      JAMMU & KASHMIR  LEH LADAKH  Chuglamsar (CT)  2017-18:Q2    600   
4               PUNJAB   GURDASPUR            TIBRI  2017-18:Q1    600   
5               PUNJAB   GURDASPUR            TIBRI  2017-18:Q2    600   
6               PUNJAB   PATHANKOT       Mamun (CT)  2017-18:Q1    600   
7               PUNJAB   PATHANKOT       Mamun (CT)  2017-18:Q2    600   

Items Deposit Offices  
0         500       4  
1         500       3  
2         500       4  
3         500       3  
4         500       4  
5         500       3  
6         500       4  
7         500       3

A third option is to use wide_to_long, but wide_to_long expects that the columns in the wide format have stubs at the start. The approach is similar to the first approach, but involves fewer steps.

It looks something like:

# Flatten the column names, but reverse the order of the tuples
#   before flattening, and add a character to split on
df.columns = ['~'.join(col[::-1]).strip() for col in df.columns.values]

# Reshape the data, Stata-style
pd.wide_to_long(df, ['Offices', 'Deposit', 'Credit'], 
   i=['State~Center_Details', 'District~Center_Details', 'Center~Center_Details'],
   j='Quarter', sep='~').reset_index()

You'll still have to do some cleanup on the "Center_Details" columns.

Pivoting MultiIndex Data

Answers (2)

Related Questions