Long to wide format for multiple column in python

Question

I am having a student examination dataset such as follows,

userid      grade   examid  subject     numberofcorrectanswers  numberofwronganswers
4           5       8       Synonyms            NULL                    NULL
4           5       8       Sentence            NULL                    NULL
4           5       8       Whole Numbers       6                       15
4           5       8       Decimals            4                       10
5           5       9       Synonyms            NULL                    NULL
5           5       9       Sentence            NULL                    NULL
5           5       9       Whole Numbers       5                       12
5           5       9       Decimals            3                       1

I want to convert this long format to wide format, where I can have the data as,

userid      grade   examid      Synonyms_numberofcorrectanswers         Synonyms_numberofwronganswers       Sentence_numberofcorrectanswers         Sentence_numberofwronganswers       Whole_numbers_numberofcorrectanswers        Whole_numbers_numberofwronganswers              Decimals_numberofcorrectanswers         Decimals_numberofwronganswers
4           5           8               NULL                                    NULL                                NULL                                    NULL                                6                                           15                                          4                                           10
5           5           9               NULL                                    NULL                                NULL                                    NULL                                5                                           12                                          3                                            1

The following are my trying,

data_subset.set_index(['userid', 'grade','examid','subject']).unstack('subject').reset_index()

But this is not coming in a single flat dataframe. There are several hierarchies inside it. Can anybody help me in making it a single flat dataframe?

Thanks

ptrj · Accepted Answer

I'll expand Alexander's answer. Say we have

df2 = df.groupby(['userid', 'grade','examid','subject']).sum().unstack('subject')

We get the names of the two-level column index as a list of 2-tuples with df2.columns.get_values(). To flatten it and combine the names:

new_col_names = ['_'.join((b,a)) for a,b in df2.columns.get_values()]
df2.columns = new_col_names

If needed:

to sort columns: for example df2.reindex(columns = sorted(df2.columns))
to set userid etc. as columns rather than a multi-index: df2.reset_index()

Long to wide format for multiple column in python

Answers (2)

Related Questions