Select rows in a pandas DataFrame that match the first two different items of one column

Question

I would like to display all rows in the DataFrame whose values under the column 'Nameid' correspond to the two first different values found in that column.

In the example below, the two first different values under the column named 'Nameid' are 1 and 2. I want to select all rows for which 'Nameid' equals either 1 or 2, and discard the rest. How do I do that?

What I have:

import pandas as pd

df = pd.DataFrame(data={
    'Nameid': [1, 2, 3, 1],
    'Name': ['Michael', 'Max', 'Susan', 'Michael'],
    'Project': ['S455', 'G874', 'B7445', 'Z874'],
})
display(df.head(10))

What I want:

jezrael · Accepted Answer

First sorting by column Nameid by DataFrame.sort_values:

df = df.sort_values('Nameid')

then use Series.isin with 2 first unique values by Series.unique:

df1 = df[df['Nameid'].isin(df['Nameid'].unique()[:2])].reset_index(drop=True)
print (df1)
   Nameid     Name Project
0       1  Michael    S455
1       1  Michael    Z874
2       2      Max    G874

Alternative with Series.drop_duplicates:

df1 = df[df['Nameid'].isin(df['Nameid'].drop_duplicates()[:2])].reset_index(drop=True)

EDIT: If want filter by equal or less like 2, thank you @DarrylG:

df2 = df[df['Nameid'] <= 2].reset_index(drop=True)

Select rows in a pandas DataFrame that match the first two different items of one column

Answers (1)

Related Questions