Reputation: 159
i have df rows contains lists and wants to remove the particular string combined with others.
df['res']:
AL1 A 15, CY1 A 16, CY1 A 20, GL1 A 17, GL1 A 62,HOH A 604, HOH A 605, L21 A 18, MG A 550, PR1 A 36, TH1 A 19, TH1 A 37, TY1 A 34, VA1 A 14, HOH A 603, VA1 A 35
Desired output: [ removed HOH with other number]
AL1 A 15, CY1 A 16, CY1 A 20, GL1 A 17, GL1 A 62, L21 A 18, MG A 550, PR1 A 36, TH1 A 19, TH1 A 37, TY1 A 34, VA1 A 14, VA1 A 35
I tried this:
data['res'].str.split().apply(lambda x: [k for k in x if k.startswith('HOH')])
Upvotes: 2
Views: 764
Reputation: 771
The problem is that if you use .split()
without anything else every substring will also get split.
So this ... ,HOH A 604 ...
will split into ['...', ',' ,'HOH', 'A', '604', '...']
.
As far as I understood you want to remove every HOH
with the following numbers right?
Doing it the .split()
way will result in removing HOH
only and keeping A
& 604
.
If you use .split(',')
with the comma as parameter then we will get everything between commas seperated.
The problem I see with startswith
is that sometimes your strings have an additional space after the comma and sometimes they don´t (e.g. ,HOH A 604 & , HOH A 605
)
Therefore I would suggest to use not in
instead. BUT: aware that this removes all sub strings that contain HOH
even if they are at the end.
try this:
df['res'].str.split(',').apply(lambda x: [k for k in x if 'HOH' not in k])
The cell value is now a list of strings if you need to have a string again try this:
df['res'].str.split(',').apply(lambda x: ','.join([k for k in x if 'HOH' not in k]))
Upvotes: 1