Reputation: 1897
I have the following dataframe called df
that has an id
column and a related_id
column. If the related_id is not null that means there is a desktop version and a mobile version.
I want to subset this dataset to keep only desktop instances if there is a related id.
input:
id related_id type
1 2 desktop
2 1 mobile
3 NaN desktop
4 6 desktop
5 NaN mobile
6 4 mobile
output:
id related_id type
1 2 desktop
3 NaN desktop
4 6 desktop
5 NaN mobile
how can i do this in pandas?
Upvotes: 1
Views: 40
Reputation: 5918
We can place the necessary conditions in query method to filter the df.
df.query('related_id.isna() | type=="desktop"')
Output
id related_id type
0 1 2.0 desktop
2 3 NaN desktop
3 4 6.0 desktop
4 5 NaN mobile
Upvotes: 1
Reputation: 323306
Try with numpy
sort
df[~pd.DataFrame(np.sort(df[['id','related_id']].values,axis=1)).duplicated().values]
id related_id type
0 1 2.0 desktop
2 3 NaN desktop
3 4 6.0 desktop
4 5 NaN mobile
Upvotes: 1
Reputation: 24314
Try with boolean masking:
mask=(df['related_id'].isna()) | (df['type']=='desktop')
Finally:
result=df[mask]
Now if you print result you will get your desired output:
id related_id type
0 1 2.0 desktop
2 3 NaN desktop
3 4 6.0 desktop
4 5 NaN mobile
Upvotes: 1