rajiv
rajiv

Reputation: 159

Remove list string startswith in pandas df

i have df rows contains lists and wants to remove the particular string combined with others.

df['res']:

AL1 A 15, CY1 A 16, CY1 A 20, GL1 A 17, GL1 A 62,HOH A 604, HOH A 605, L21 A 18, MG A 550, PR1 A 36, TH1 A 19, TH1 A 37, TY1 A 34, VA1 A 14, HOH A 603, VA1 A 35

Desired output: [ removed HOH with other number]

AL1 A 15, CY1 A 16, CY1 A 20, GL1 A 17, GL1 A 62, L21 A 18, MG A 550, PR1 A 36, TH1 A 19, TH1 A 37, TY1 A 34, VA1 A 14, VA1 A 35

I tried this:

data['res'].str.split().apply(lambda x: [k for k in x if k.startswith('HOH')])

Upvotes: 2

Views: 764

Answers (1)

bootica
bootica

Reputation: 771

The problem is that if you use .split() without anything else every substring will also get split.

So this ... ,HOH A 604 ... will split into ['...', ',' ,'HOH', 'A', '604', '...'].

As far as I understood you want to remove every HOH with the following numbers right?

Doing it the .split() way will result in removing HOH only and keeping A & 604.

If you use .split(',') with the comma as parameter then we will get everything between commas seperated.

The problem I see with startswith is that sometimes your strings have an additional space after the comma and sometimes they don´t (e.g. ,HOH A 604 & , HOH A 605)

Therefore I would suggest to use not in instead. BUT: aware that this removes all sub strings that contain HOH even if they are at the end.

try this:

df['res'].str.split(',').apply(lambda x: [k for k in x if 'HOH' not in k])

The cell value is now a list of strings if you need to have a string again try this:

df['res'].str.split(',').apply(lambda x: ','.join([k for k in x if 'HOH' not in k]))

Upvotes: 1

Related Questions