Reputation: 1897
I have the following dataframe called df
that has an id
column and a related_id
column. If the related_id is not null that means there is a desktop version and a mobile version.
I want to subset this dataset to keep only desktop instances if there is a related id. There are also situations however where there can be a mobile observation which has a related id which is also a mobile observation (ids 7 and 8 are examples). In this case i want to keep just one of those observations.
input:
id related_id type
1e 2e desktop
2e 1e mobile
3x NaN desktop
4w 6w desktop
5m NaN mobile
6w 4w mobile
7q 8q mobile
8q 7q mobile
output:
id related_id type
1e 2e desktop
3x NaN desktop
4w 6w desktop
5m NaN mobile
7q 8q mobile
how can i do this in pandas?
Upvotes: 1
Views: 77
Reputation: 75080
You can np.sort
the id
and related_id
on axis=1
and create a helper series for grouping, then calculate the idxmin
with a helper dictionary for the weights:
s = pd.Series(map(tuple,np.sort(df[['id','related_id']].fillna(''),axis=1)))
d= {'desktop':1,'mobile':2}
out = df.loc[df['type'].map(d).groupby(s,sort=False).idxmin()]
print(out)
id related_id type
0 1e 2e desktop
2 3x NaN desktop
3 4w 6w desktop
4 5m NaN mobile
6 7q 8q mobile
Upvotes: 1