Reputation: 3754
Given dataframe in this form:
ID A
130 Yes
130-1 Yes
130-2 Yes
200 No
201 No
201-10 No
201-101 Yes
201-22 Yes
300 No
I want to drop the rows that have value from ID
column present in another string before the hyphen (-
) in other rows
So based on this I would drop value 201
since there are 201-10
, 201-101
etc.
Expected output:
ID A
130-1 Yes
130-2 Yes
200 No
201-10 No
201-101 Yes
201-22 Yes
300 No
Upvotes: 3
Views: 56
Reputation: 51175
Using duplicated
and some bitwise operations. This does rely on the values without hyphens being before the values with hyphens.
s = df['ID'].str.split('-').str[0]
m = s.duplicated(keep=False) ^ s.duplicated()
df[~m]
ID A
1 130-1 Yes
2 130-2 Yes
3 200 No
5 201-10 No
6 201-101 Yes
7 201-22 Yes
8 300 No
Upvotes: 3
Reputation: 88236
Here's one approach:
g = df.ID.str.split('-').str[0]
is_child = g.eq(g.shift())
is_unique = g.groupby(g).transform('size').eq(1)
output = df[is_child | is_unique]
print(output)
ID A
1 130-1 Yes
2 130-2 Yes
3 200 No
5 201-10 No
6 201-101 Yes
7 201-22 Yes
8 300 No
Where:
df.assign(first_num=g,
is_child=is_child,
is_unique=is_unique)
ID A first_num is_child is_unique
0 130 Yes 130 False False
1 130-1 Yes 130 True False
2 130-2 Yes 130 True False
3 200 No 200 False True
4 201 No 201 False False
5 201-10 No 201 True False
6 201-101 Yes 201 True False
7 201-22 Yes 201 True False
8 300 No 300 False True
Upvotes: 1