Haseeb Sultan
Haseeb Sultan

Reputation: 103

Group by based on user id and their interactions

Dataset and Notebook file : https://drive.google.com/drive/folders/14z16wOEjKe299oSxu_wlh5Zr-dfbnXgE?usp=sharing

Can anyone help me out on this?

I have dataframe (named as dfm2)

THIS IS THE HEAD OF DATAFRAME Dataframe Head

THIS IS THE TAIL OF DATAFRAME DataFrame Tail

I wanna see how much each user has attempted total questions and how many are correct & incorrect? and plot this as Y axis could be percentage and X axis should be each user.

questions_id : Contains question number user_answer : Contains what user has answered to that question (a , b , c or d)

user_iD : this identifies each each

correct_answer : it's basically the answer key.

user_correct : it's 0 if user answer is incorrect and 1 if user answers correctly

What I have tried so far

df_total_questions_attempted = dfm2.groupby(['user_iD'])['question_id'].count().to_frame('Total Questions Attempted')

df_correct = dfm2[dfm2['user_correct']==1].groupby(['user_iD'])['question_id'].count().to_frame('Correct')

df_incorrect = dfm2[dfm2['user_correct']==0].groupby(['user_iD'])['question_id'].count().to_frame('Incorrect')

df = pd.concat([df_total_questions_attempted, df_correct, df_incorrect], axis=1).fillna(0)

df['Percentage'] = (df['Correct'] / df['Total Questions Attempted']) *100

THIS IS THE OUTPUT I GET

enter image description here

The problem with this output is that it's making user_iD as index and not a column and secondly user_iD's are not as 1,2,3,4,5..... Let me post the user_ID head too

enter image description here

It doesn't returns the expected output , it should take user_iD from the dataframe (dfm2) and make it as a column not an index

THIS IS THE EXPECTED OUTPUT Expected Output

Upvotes: 0

Views: 643

Answers (1)

heretolearn
heretolearn

Reputation: 6555

To avoid user_id being set as index, use as_index=False in the groupby like:

df_total_questions_attempted = dfm2.groupby(['user_iD'], as_index=False)['question_id'].count()

By default the values are sorted on the groupby keys, in case you don't want the values to be sorted, set the sort=False

df_total_questions_attempted = dfm2.groupby(['user_iD'], 
                               sort=False, as_index=False)['question_id'].count()

Upvotes: 1

Related Questions