Pivoting table, while preserving all columns not involved

Question

This code will generate a very simple dummy dataframe, where people filled a survey form:

df2 = pd.DataFrame({
    'name':['John','John','John','Rachel','Rachel','Rachel'],
    'gender':['Male','Male','Male','Female','Female','Female'],
    'age':[40,40,40,39,39,39],
    'SurveyQuestion':['Married?','HasKids?','Smokes?','Married?','HasKids?','Smokes?'],
    'answers':['Yes','Yes','No','Yes','No','No']
})

The output looks like so:

Because of the way the table is structured, with each question having its own row, we see that the first 3 columns always contain the same info, as it's just repeating the info based on the person that filled in the survey.

It would be better to visualize the dataframe as a pivot-table, similar to the following:

df2.pivot(index='name',columns='SurveyQuestion',values='answers')

However, doing it this way results in many of the previous columns being lost, since only 1 column can be used as the index.

I'm wondering what the most straightforward way of doing this would be that didn't involve an extra step of rejoining the columns.

Mayank Porwal · Accepted Answer

You can use df.pivot_table:

In [27]: df2.pivot_table(values='answers', index=['name','gender','age'], columns='SurveyQuestion', aggfunc='first')
Out[27]: 
SurveyQuestion    HasKids? Married? Smokes?
name   gender age                          
John   Male   40       Yes      Yes      No
Rachel Female 39        No      Yes      No

OR, you can use df.pivot with df.set_index, like this:

In [30]: df = df2.set_index(['name', 'gender', 'age'])
In [32]: df.pivot(index=df.index, columns='SurveyQuestion')['answers']
Out[32]: 
SurveyQuestion    HasKids? Married? Smokes?
name   gender age                          
John   Male   40       Yes      Yes      No
Rachel Female 39        No      Yes      No

Pivoting table, while preserving all columns not involved

Answers (2)

Related Questions