Reputation: 103
Dataset and Notebook file : https://drive.google.com/drive/folders/14z16wOEjKe299oSxu_wlh5Zr-dfbnXgE?usp=sharing
Can anyone help me out on this?
I have dataframe (named as dfm2)
I wanna see how much each user has attempted total questions and how many are correct & incorrect? and plot this as Y axis could be percentage and X axis should be each user.
questions_id : Contains question number user_answer : Contains what user has answered to that question (a , b , c or d)
user_iD : this identifies each each
correct_answer : it's basically the answer key.
user_correct : it's 0 if user answer is incorrect and 1 if user answers correctly
What I have tried so far
df_total_questions_attempted = dfm2.groupby(['user_iD'])['question_id'].count().to_frame('Total Questions Attempted')
df_correct = dfm2[dfm2['user_correct']==1].groupby(['user_iD'])['question_id'].count().to_frame('Correct')
df_incorrect = dfm2[dfm2['user_correct']==0].groupby(['user_iD'])['question_id'].count().to_frame('Incorrect')
df = pd.concat([df_total_questions_attempted, df_correct, df_incorrect], axis=1).fillna(0)
df['Percentage'] = (df['Correct'] / df['Total Questions Attempted']) *100
THIS IS THE OUTPUT I GET
The problem with this output is that it's making user_iD as index and not a column and secondly user_iD's are not as 1,2,3,4,5..... Let me post the user_ID head too
It doesn't returns the expected output , it should take user_iD from the dataframe (dfm2) and make it as a column not an index
Upvotes: 0
Views: 643
Reputation: 6555
To avoid user_id being set as index, use as_index=False
in the groupby
like:
df_total_questions_attempted = dfm2.groupby(['user_iD'], as_index=False)['question_id'].count()
By default the values are sorted on the groupby keys, in case you don't want the values to be sorted, set the sort=False
df_total_questions_attempted = dfm2.groupby(['user_iD'],
sort=False, as_index=False)['question_id'].count()
Upvotes: 1