Victor Nogueira
Victor Nogueira

Reputation: 159

How can I reshape a repeating wide pandas DataFrame to be stacked?

I have a Pandas DataFrame where respondents answer the same question over multiple sales accounts. My input Dataframe is of the following format

df = pd.DataFrame({"Sales_Acc1":[100,300],
              "Sales_Acc2":[200,500],
              "Time_Acc1":[2,5],
              "Time_acc2":[6,9],
              "Acc_Number_acc1":[1001,1005],
              "Acc_Number_acc2":[1009,1010]},
             index=["John","Dave"])
df
>>> Sales_Acc1  Sales_Acc2  Time_Acc1   Time_acc2   Acc_Number_acc1 Acc_Number_acc2
John    100     200          2          6           1001            1009
Dave    300     500          5          9           1005            1010

I want to pivot this so that each account would have its own row. My desired end Dataframe would look like:

df
>>> AccountNumber   Rep   Sales   Time 
     1001           John  100     2
     1005           John  300     6
     1009           Dave  200     5
     1010           Dave  500     9

I have tried using melt as well as pivot but I cannot figure it out. I appreciate any assistance.

Upvotes: 1

Views: 90

Answers (2)

user27286
user27286

Reputation: 486

df1=df[['Sales_Acc1','Time_Acc1','Acc_Number_acc1']]
df2=df[['Sales_Acc2','Time_acc2','Acc_Number_acc2']]
df1.columns=['Sales_Acc','Time_Acc','Acc_Number']
df2.columns=['Sales_Acc','Time_Acc','Acc_Number']
df3 = df1.append(df2)
df3.index.names = ['Rep']

My solution is as simple as this and it would work in this case we are basically putting columns in different rows.

First selecting the relevant columns and then renaming the column names and appending it to the other to get the final result.

This is the full code:

import pandas as pd
df = pd.DataFrame({"Sales_Acc1":[100,300],
              "Sales_Acc2":[200,500],
              "Time_Acc1":[2,5],
              "Time_acc2":[6,9],
              "Acc_Number_acc1":[1001,1005],
              "Acc_Number_acc2":[1009,1010]},
             index=["John","Dave"])
df1=df[['Sales_Acc1','Time_Acc1','Acc_Number_acc1']]
df2=df[['Sales_Acc2','Time_acc2','Acc_Number_acc2']]
df1.columns=['Sales_Acc','Time_Acc','Acc_Number']
df2.columns=['Sales_Acc','Time_Acc','Acc_Number']
df3 = df1.append(df2)
df3.index.names = ['Rep']
df3.head()

Output:

enter image description here

Upvotes: 2

ALollz
ALollz

Reputation: 59549

This is a wide_to_long problem given your column are in the format 'stub_SomeSuffix'. Because you have some inconsistent cases we'll make everything lower. We also need to remove the names from the index since wide_to_long requires columns.

df.columns = df.columns.str.lower()

df = (pd.wide_to_long(df.rename_axis('Rep').reset_index(),    # Use Rep as index
                      i='Rep',                                # index of output
                      j='will_drop',                          # Suffix labels  
                      stubnames=['sales', 'acc_number', 'time'],
                      sep='_', 
                      suffix='.*') 
        .reset_index()
        .drop(columns='will_drop'))

#    Rep  sales  acc_number  time
#0  John    100        1001     2
#1  Dave    300        1005     5
#2  John    200        1009     6
#3  Dave    500        1010     9

If you aren't a fan of the endless arguments required with wide_to_long we can instead create a simple MultiIndex on the columns and then this is a stack. Again because of inconsistent casing we make all the columns fully lower case.

df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.str.lower().str.rsplit('_', n=1)))
#     sales      time      acc_number      
#      acc1 acc2 acc1 acc2       acc1  acc2
#John   100  200    2    6       1001  1009
#Dave   300  500    5    9       1005  1010

df.stack(-1).reset_index(-1, drop=True)
#      acc_number  sales  time
#John        1001    100     2
#John        1009    200     6
#Dave        1005    300     5
#Dave        1010    500     9

Upvotes: 3

Related Questions