Reputation: 2015
I am having a student examination dataset such as follows,
userid grade examid subject numberofcorrectanswers numberofwronganswers
4 5 8 Synonyms NULL NULL
4 5 8 Sentence NULL NULL
4 5 8 Whole Numbers 6 15
4 5 8 Decimals 4 10
5 5 9 Synonyms NULL NULL
5 5 9 Sentence NULL NULL
5 5 9 Whole Numbers 5 12
5 5 9 Decimals 3 1
I want to convert this long format to wide format, where I can have the data as,
userid grade examid Synonyms_numberofcorrectanswers Synonyms_numberofwronganswers Sentence_numberofcorrectanswers Sentence_numberofwronganswers Whole_numbers_numberofcorrectanswers Whole_numbers_numberofwronganswers Decimals_numberofcorrectanswers Decimals_numberofwronganswers
4 5 8 NULL NULL NULL NULL 6 15 4 10
5 5 9 NULL NULL NULL NULL 5 12 3 1
The following are my trying,
data_subset.set_index(['userid', 'grade','examid','subject']).unstack('subject').reset_index()
But this is not coming in a single flat dataframe. There are several hierarchies inside it. Can anybody help me in making it a single flat dataframe?
Thanks
Upvotes: 3
Views: 2389
Reputation: 5222
I'll expand Alexander's answer. Say we have
df2 = df.groupby(['userid', 'grade','examid','subject']).sum().unstack('subject')
We get the names of the two-level column index as a list of 2-tuples with df2.columns.get_values()
. To flatten it and combine the names:
new_col_names = ['_'.join((b,a)) for a,b in df2.columns.get_values()]
df2.columns = new_col_names
If needed:
to sort columns: for example df2.reindex(columns = sorted(df2.columns))
to set userid
etc. as columns rather than a multi-index: df2.reset_index()
Upvotes: 1
Reputation: 109626
Something like this?
>>> df.groupby(['userid', 'grade','examid','subject']).sum().unstack('subject')
numberofcorrectanswers numberofwronganswers
subject Decimals Sentence Synonyms Whole Numbers Decimals Sentence Synonyms Whole Numbers
userid grade examid
4 5 8 4 NaN NaN 6 10 NaN NaN 15
5 5 9 3 NaN NaN 5 1 NaN NaN 12
Upvotes: 2