haimen
haimen

Reputation: 2015

Long to wide format for multiple column in python

I am having a student examination dataset such as follows,

userid      grade   examid  subject     numberofcorrectanswers  numberofwronganswers
4           5       8       Synonyms            NULL                    NULL
4           5       8       Sentence            NULL                    NULL
4           5       8       Whole Numbers       6                       15
4           5       8       Decimals            4                       10
5           5       9       Synonyms            NULL                    NULL
5           5       9       Sentence            NULL                    NULL
5           5       9       Whole Numbers       5                       12
5           5       9       Decimals            3                       1

I want to convert this long format to wide format, where I can have the data as,

userid      grade   examid      Synonyms_numberofcorrectanswers         Synonyms_numberofwronganswers       Sentence_numberofcorrectanswers         Sentence_numberofwronganswers       Whole_numbers_numberofcorrectanswers        Whole_numbers_numberofwronganswers              Decimals_numberofcorrectanswers         Decimals_numberofwronganswers
4           5           8               NULL                                    NULL                                NULL                                    NULL                                6                                           15                                          4                                           10
5           5           9               NULL                                    NULL                                NULL                                    NULL                                5                                           12                                          3                                            1

The following are my trying,

data_subset.set_index(['userid', 'grade','examid','subject']).unstack('subject').reset_index()

But this is not coming in a single flat dataframe. There are several hierarchies inside it. Can anybody help me in making it a single flat dataframe?

Thanks

Upvotes: 3

Views: 2389

Answers (2)

ptrj
ptrj

Reputation: 5222

I'll expand Alexander's answer. Say we have

df2 = df.groupby(['userid', 'grade','examid','subject']).sum().unstack('subject')

We get the names of the two-level column index as a list of 2-tuples with df2.columns.get_values(). To flatten it and combine the names:

new_col_names = ['_'.join((b,a)) for a,b in df2.columns.get_values()]
df2.columns = new_col_names

If needed:

  • to sort columns: for example df2.reindex(columns = sorted(df2.columns))

  • to set userid etc. as columns rather than a multi-index: df2.reset_index()

Upvotes: 1

Alexander
Alexander

Reputation: 109626

Something like this?

>>> df.groupby(['userid', 'grade','examid','subject']).sum().unstack('subject')
                    numberofcorrectanswers                                 numberofwronganswers                                
subject                           Decimals Sentence Synonyms Whole Numbers             Decimals Sentence Synonyms Whole Numbers
userid grade examid                                                                                                            
4      5     8                           4      NaN      NaN             6                   10      NaN      NaN            15
5      5     9                           3      NaN      NaN             5                    1      NaN      NaN            12

Upvotes: 2

Related Questions