Khashhogi
Khashhogi

Reputation: 119

removing specific rows from pandas dataframe

I have a pd.DataFrame which has students' exam performance metrics in each row. Each student has a unique ID, and each student has a unique row for the questions they solved on the exam. For example, student with ID "a1a1" has attempted two questions whereas student with ID "w2e3" has attempted only one question. (sample df)

enter image description here

I want to find the students who have attempted to solve less than 3 questions and remove the rows associated with them from the data-frame. How can I do this with pd.DataFrame methods?

Upvotes: 0

Views: 37

Answers (1)

Mark
Mark

Reputation: 563

Use value_counts() on studentID

import pandas as pd

df = pd.DataFrame({'studentID':['a','a','a','b','b','b', 'c'],
                   'problemID':[1,2,3,1,2,3,1]})
print(df)
tmp = df['studentID'].value_counts()
tmp = tmp[tmp >= 3]
new_df = df[df['studentID'].isin(tmp.index)]
print(new_df)

Output:

  studentID  problemID
0         a          1
1         a          2
2         a          3
3         b          1
4         b          2
5         b          3
6         c          1

  studentID  problemID
0         a          1
1         a          2
2         a          3
3         b          1
4         b          2
5         b          3

Upvotes: 1

Related Questions