Eisen
Eisen

Reputation: 1897

multiple conditions to subset dataframe based on related id

I have the following dataframe called df that has an id column and a related_id column. If the related_id is not null that means there is a desktop version and a mobile version. I want to subset this dataset to keep only desktop instances if there is a related id. There are also situations however where there can be a mobile observation which has a related id which is also a mobile observation (ids 7 and 8 are examples). In this case i want to keep just one of those observations.

input:

id related_id type
1e  2e          desktop
2e  1e          mobile
3x  NaN        desktop
4w  6w          desktop
5m  NaN        mobile
6w  4w          mobile
7q  8q          mobile
8q  7q          mobile

output:

id related_id type
1e  2e          desktop
3x  NaN        desktop
4w  6w          desktop
5m  NaN        mobile
7q  8q          mobile


how can i do this in pandas?

Upvotes: 1

Views: 77

Answers (1)

anky
anky

Reputation: 75080

You can np.sort the id and related_id on axis=1 and create a helper series for grouping, then calculate the idxmin with a helper dictionary for the weights:

s = pd.Series(map(tuple,np.sort(df[['id','related_id']].fillna(''),axis=1)))
d= {'desktop':1,'mobile':2}
out = df.loc[df['type'].map(d).groupby(s,sort=False).idxmin()]

print(out)

   id related_id     type
0  1e         2e  desktop
2  3x        NaN  desktop
3  4w         6w  desktop
4  5m        NaN   mobile
6  7q         8q   mobile

Upvotes: 1

Related Questions