Reputation: 159
I have a Pandas DataFrame where respondents answer the same question over multiple sales accounts. My input Dataframe is of the following format
df = pd.DataFrame({"Sales_Acc1":[100,300],
"Sales_Acc2":[200,500],
"Time_Acc1":[2,5],
"Time_acc2":[6,9],
"Acc_Number_acc1":[1001,1005],
"Acc_Number_acc2":[1009,1010]},
index=["John","Dave"])
df
>>> Sales_Acc1 Sales_Acc2 Time_Acc1 Time_acc2 Acc_Number_acc1 Acc_Number_acc2
John 100 200 2 6 1001 1009
Dave 300 500 5 9 1005 1010
I want to pivot this so that each account would have its own row. My desired end Dataframe would look like:
df
>>> AccountNumber Rep Sales Time
1001 John 100 2
1005 John 300 6
1009 Dave 200 5
1010 Dave 500 9
I have tried using melt as well as pivot but I cannot figure it out. I appreciate any assistance.
Upvotes: 1
Views: 90
Reputation: 486
df1=df[['Sales_Acc1','Time_Acc1','Acc_Number_acc1']]
df2=df[['Sales_Acc2','Time_acc2','Acc_Number_acc2']]
df1.columns=['Sales_Acc','Time_Acc','Acc_Number']
df2.columns=['Sales_Acc','Time_Acc','Acc_Number']
df3 = df1.append(df2)
df3.index.names = ['Rep']
My solution is as simple as this and it would work in this case we are basically putting columns in different rows.
First selecting the relevant columns and then renaming the column names and appending it to the other to get the final result.
This is the full code:
import pandas as pd
df = pd.DataFrame({"Sales_Acc1":[100,300],
"Sales_Acc2":[200,500],
"Time_Acc1":[2,5],
"Time_acc2":[6,9],
"Acc_Number_acc1":[1001,1005],
"Acc_Number_acc2":[1009,1010]},
index=["John","Dave"])
df1=df[['Sales_Acc1','Time_Acc1','Acc_Number_acc1']]
df2=df[['Sales_Acc2','Time_acc2','Acc_Number_acc2']]
df1.columns=['Sales_Acc','Time_Acc','Acc_Number']
df2.columns=['Sales_Acc','Time_Acc','Acc_Number']
df3 = df1.append(df2)
df3.index.names = ['Rep']
df3.head()
Output:
Upvotes: 2
Reputation: 59549
This is a wide_to_long
problem given your column are in the format 'stub_SomeSuffix'
. Because you have some inconsistent cases we'll make everything lower. We also need to remove the names from the index
since wide_to_long
requires columns.
df.columns = df.columns.str.lower()
df = (pd.wide_to_long(df.rename_axis('Rep').reset_index(), # Use Rep as index
i='Rep', # index of output
j='will_drop', # Suffix labels
stubnames=['sales', 'acc_number', 'time'],
sep='_',
suffix='.*')
.reset_index()
.drop(columns='will_drop'))
# Rep sales acc_number time
#0 John 100 1001 2
#1 Dave 300 1005 5
#2 John 200 1009 6
#3 Dave 500 1010 9
If you aren't a fan of the endless arguments required with wide_to_long
we can instead create a simple MultiIndex
on the columns and then this is a stack
. Again because of inconsistent casing we make all the columns fully lower case.
df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.str.lower().str.rsplit('_', n=1)))
# sales time acc_number
# acc1 acc2 acc1 acc2 acc1 acc2
#John 100 200 2 6 1001 1009
#Dave 300 500 5 9 1005 1010
df.stack(-1).reset_index(-1, drop=True)
# acc_number sales time
#John 1001 100 2
#John 1009 200 6
#Dave 1005 300 5
#Dave 1010 500 9
Upvotes: 3