removing specific rows from pandas dataframe

Question

I have a pd.DataFrame which has students' exam performance metrics in each row. Each student has a unique ID, and each student has a unique row for the questions they solved on the exam. For example, student with ID "a1a1" has attempted two questions whereas student with ID "w2e3" has attempted only one question. (sample df)

I want to find the students who have attempted to solve less than 3 questions and remove the rows associated with them from the data-frame. How can I do this with pd.DataFrame methods?

Mark · Accepted Answer

Use value_counts() on studentID

import pandas as pd

df = pd.DataFrame({'studentID':['a','a','a','b','b','b', 'c'],
                   'problemID':[1,2,3,1,2,3,1]})
print(df)
tmp = df['studentID'].value_counts()
tmp = tmp[tmp >= 3]
new_df = df[df['studentID'].isin(tmp.index)]
print(new_df)

Output:

  studentID  problemID
0         a          1
1         a          2
2         a          3
3         b          1
4         b          2
5         b          3
6         c          1

  studentID  problemID
0         a          1
1         a          2
2         a          3
3         b          1
4         b          2
5         b          3

removing specific rows from pandas dataframe

Answers (1)

Related Questions