Eisen
Eisen

Reputation: 1897

subsetting dataframe based on related_id column

I have the following dataframe called df that has an id column and a related_id column. If the related_id is not null that means there is a desktop version and a mobile version. I want to subset this dataset to keep only desktop instances if there is a related id.

input:

id related_id type
1  2          desktop
2  1          mobile
3  NaN        desktop
4  6          desktop
5  NaN        mobile
6  4          mobile

output:

id related_id type
1  2          desktop
3  NaN        desktop
4  6          desktop
5  NaN        mobile

how can i do this in pandas?

Upvotes: 1

Views: 40

Answers (3)

Utsav
Utsav

Reputation: 5918

We can place the necessary conditions in query method to filter the df.

df.query('related_id.isna() | type=="desktop"')

Output

    id  related_id  type
0   1   2.0       desktop
2   3   NaN       desktop
3   4   6.0       desktop
4   5   NaN       mobile

Upvotes: 1

BENY
BENY

Reputation: 323306

Try with numpy sort

df[~pd.DataFrame(np.sort(df[['id','related_id']].values,axis=1)).duplicated().values]
   id  related_id     type
0   1         2.0  desktop
2   3         NaN  desktop
3   4         6.0  desktop
4   5         NaN   mobile

Upvotes: 1

Anurag Dabas
Anurag Dabas

Reputation: 24314

Try with boolean masking:

mask=(df['related_id'].isna()) | (df['type']=='desktop')

Finally:

result=df[mask]

Now if you print result you will get your desired output:

    id  related_id  type
0   1   2.0         desktop
2   3   NaN         desktop
3   4   6.0         desktop
4   5   NaN         mobile

Upvotes: 1

Related Questions