user5987132
user5987132

Reputation: 75

Is there an easy way to find a substring that matches a pattern within a string and extract it?

I know this is open-ended, but I'm not sure how to go about it.

Say I have the string "FDBFBDFLDJVHVBDVBD" and want to find every sub-string that starts with something like "BDF" and ends with either "EFG" or "EDS", is there an easy way to do this?

Upvotes: 1

Views: 43

Answers (3)

jfs
jfs

Reputation: 414685

find every sub-string that starts with something like "BDF" and ends with either "EFG" or "EDS"

It is a job for a regular expression. To extract all such substrings as a list:

import re

substrings = re.findall(r'BDF.*?E(?:FG|DS)', text)

If a substring might contain newlines then pass flags=re.DOTALL.

Example:

>>> re.findall(r'BDF.*?E(?:FG|DS)', "FDBFBDFLDJVHVBDVBDBDFEFGEDS")
['BDFLDJVHVBDVBDBDFEFG']

.*? is not greedy and therefore the shortest substrings are selected. Remove ?, to get the longest match instead.

Upvotes: 1

Mauro Baraldi
Mauro Baraldi

Reputation: 6575

You can use re.finditer

>>> import re
>>> s = "FDBFBDFLDJVHVBDVBDBDFEFGEDS"
>>> print [s[a.start(): a.end()] for a in re.finditer('BDF', s)]
['BDF', 'BDF'] 

Upvotes: 1

ml-moron
ml-moron

Reputation: 896

Seeing as there is no regex expert here yet, I will propose this solution (BTW I added "BDFEFGEDS" to the end of your string so it would give some results):

import re

s = "FDBFBDFLDJVHVBDVBDBDFEFGEDS"

endings = ['EFG', 'EDS']
matches = []

for ending in endings:
    match = re.findall(r'(?=(BDF.*{0}))'.format(ending), s)
    matches.extend(match)

print matches

giving the result:

['BDFLDJVHVBDVBDBDFEFG', 'BDFEFG', 'BDFLDJVHVBDVBDBDFEFGEDS', 'BDFEFGEDS']

Upvotes: 0

Related Questions