Reputation: 43
I'm trying to remove elements of list if they contain a string pattern found in a different list. It's probably really basic but I can't find a solution that works for this anywhere. All other questions on here seem to pertain to dropping elements if they are found in a given string - not in a list of strings.
So I have a list of strings (molecules as it were):
molecules = ['[C:1]([H])(=[O:3])O.[c:3]1([CH3:8])[cH:7][c:11]',
'[C:1]([H])(=[O:3])O.[c:7]1([NH:8][CH3:11])[cH:7]',
'[C:1]12([H])[c:3]1([cH:8][cH:12]',
'[C:2]12[c:6]1([cH:9][cH:12]']
And a list of patterns that I don't want in my list of strings:
patterns_to_drop = ['[C:1]12((H)[c:3]', '[C:2]12[c:6]']
Desired output:
['[C:1]([H])(=[O:3])O.[c:3]1([CH3:8])[cH:7][c:11]', '[C:1]([H])(=[O:3])O.[c:7]1([NH:8][CH3:11])[cH:7]']
I.e. in this case I want to drop molecules[2] and molecules[3] as they match the pattern.
This code below runs, but as pointed out in the comments, this will just remove exact matches whereas I'd like to remove based on partial matches.
to_keep = [x for x in molecules if x not in patterns_to_drop]
print(to_keep)
>> ['[C:1]([H])(=[O:3])O.[c:3]1([CH3:8])[cH:7][c:11]',
'[C:1]([H])(=[O:3])O.[c:7]1([NH:8][CH3:11])[cH:7]',
'[C:1]12([H])[c:3]1([cH:8][cH:12]',
'[C:2]12[c:6]1([cH:9][cH:12]']
The only other working solution I've found is to paste the pattern directly in a list comprehension - but the list of patterns to drop will probably keep growing and I don't want to write a list comprehension for each. It's really important that the solution works for any length of either list.
I'm out of ideas so I'd really appreciate if someone could help out here.
Upvotes: 4
Views: 1141
Reputation: 707
You can use nested list comprehension, for example:
to_keep = [x for x in molecules if all(y not in x for y in patterns_to_drop)]
Upvotes: 2