Reputation: 119
I have a pd.DataFrame which has students' exam performance metrics in each row. Each student has a unique ID, and each student has a unique row for the questions they solved on the exam. For example, student with ID "a1a1" has attempted two questions whereas student with ID "w2e3" has attempted only one question. (sample df)
I want to find the students who have attempted to solve less than 3 questions and remove the rows associated with them from the data-frame. How can I do this with pd.DataFrame methods?
Upvotes: 0
Views: 37
Reputation: 563
Use value_counts()
on studentID
import pandas as pd
df = pd.DataFrame({'studentID':['a','a','a','b','b','b', 'c'],
'problemID':[1,2,3,1,2,3,1]})
print(df)
tmp = df['studentID'].value_counts()
tmp = tmp[tmp >= 3]
new_df = df[df['studentID'].isin(tmp.index)]
print(new_df)
Output:
studentID problemID
0 a 1
1 a 2
2 a 3
3 b 1
4 b 2
5 b 3
6 c 1
studentID problemID
0 a 1
1 a 2
2 a 3
3 b 1
4 b 2
5 b 3
Upvotes: 1