Reputation:
I would like to display all rows in the DataFrame whose values under the column 'Nameid'
correspond to the two first different values found in that column.
In the example below, the two first different values under the column named 'Nameid'
are 1
and 2
. I want to select all rows for which 'Nameid'
equals either 1
or 2
, and discard the rest. How do I do that?
What I have:
import pandas as pd
df = pd.DataFrame(data={
'Nameid': [1, 2, 3, 1],
'Name': ['Michael', 'Max', 'Susan', 'Michael'],
'Project': ['S455', 'G874', 'B7445', 'Z874'],
})
display(df.head(10))
What I want:
Upvotes: 1
Views: 179
Reputation: 863301
First sorting by column Nameid
by DataFrame.sort_values
:
df = df.sort_values('Nameid')
then use Series.isin
with 2 first unique values by Series.unique
:
df1 = df[df['Nameid'].isin(df['Nameid'].unique()[:2])].reset_index(drop=True)
print (df1)
Nameid Name Project
0 1 Michael S455
1 1 Michael Z874
2 2 Max G874
Alternative with Series.drop_duplicates
:
df1 = df[df['Nameid'].isin(df['Nameid'].drop_duplicates()[:2])].reset_index(drop=True)
EDIT: If want filter by equal or less like 2
, thank you @DarrylG:
df2 = df[df['Nameid'] <= 2].reset_index(drop=True)
Upvotes: 1