Reputation: 69
I have a pandas dataframe with a list of courses that have been attempted: the ID of the student, the course ID, the term it was attempted, and the grade they received. It looks sort of like:
Student ID Course ID Academic Term Final Grade
N99999 MAT101 201501 2.0
N99999 MAT101 201509 2.5
N11111 CHE101 201609 3.0
N11111 PSY101 201601 0.0
N11111 PSY101 201701 2.0
I am interested in analyzing their grades and whether students improved by repeating a course.
To do so, I would like to add an 'attempt' column, where the row with the lowest term value for each combined course and ID number would get a 1, the next a 2, etc.
Student ID Course ID Academic Term Final Grade Attempt
N99999 MAT101 201501 2.0 1
N99999 MAT101 201509 2.5 2
N11111 CHE101 201609 3.0 1
N11111 PSY101 201601 0.0 1
N11111 PSY101 201701 2.0 2
How would I go about doing this?
Thank you.
Upvotes: 1
Views: 576
Reputation: 7038
It looks like you're looking for a standard Partition. There are several ways to do this but I've been doing it the below way. Try the below (note I changed/condensed some of the data):
df
Output:
student_id course_id academic_term final_grade
0 n999 mat101 201501 2.0
1 n999 mat101 201509 2.5
2 n111 che101 201609 3.0
3 n111 psy101 201601 0.0
4 n111 psy101 201701 2.0
Input
df['attempt'] = df.sort_values('academic_term').groupby(['student_id', 'course_id'], sort=False).cumcount()+1
Output:
student_id course_id academic_term final_grade attempt
0 n999 mat101 201501 2.0 1
1 n999 mat101 201509 2.5 2
2 n111 che101 201609 3.0 1
3 n111 psy101 201601 0.0 1
4 n111 psy101 201701 2.0 2
Upvotes: 4