Reputation: 75
I know this is open-ended, but I'm not sure how to go about it.
Say I have the string "FDBFBDFLDJVHVBDVBD"
and want to find every sub-string that starts with something like "BDF"
and ends with either "EFG"
or "EDS"
, is there an easy way to do this?
Upvotes: 1
Views: 43
Reputation: 414685
find every sub-string that starts with something like "BDF" and ends with either "EFG" or "EDS"
It is a job for a regular expression. To extract all such substrings as a list:
import re
substrings = re.findall(r'BDF.*?E(?:FG|DS)', text)
If a substring might contain newlines then pass flags=re.DOTALL
.
Example:
>>> re.findall(r'BDF.*?E(?:FG|DS)', "FDBFBDFLDJVHVBDVBDBDFEFGEDS")
['BDFLDJVHVBDVBDBDFEFG']
.*?
is not greedy and therefore the shortest substrings are selected. Remove ?
, to get the longest match instead.
Upvotes: 1
Reputation: 6575
You can use re.finditer
>>> import re
>>> s = "FDBFBDFLDJVHVBDVBDBDFEFGEDS"
>>> print [s[a.start(): a.end()] for a in re.finditer('BDF', s)]
['BDF', 'BDF']
Upvotes: 1
Reputation: 896
Seeing as there is no regex expert here yet, I will propose this solution (BTW I added "BDFEFGEDS"
to the end of your string so it would give some results):
import re
s = "FDBFBDFLDJVHVBDVBDBDFEFGEDS"
endings = ['EFG', 'EDS']
matches = []
for ending in endings:
match = re.findall(r'(?=(BDF.*{0}))'.format(ending), s)
matches.extend(match)
print matches
giving the result:
['BDFLDJVHVBDVBDBDFEFG', 'BDFEFG', 'BDFLDJVHVBDVBDBDFEFGEDS', 'BDFEFGEDS']
Upvotes: 0