Reputation: 187
I have for example the string BANANA
and want to find all possible substrings beginning with a vowel. The result I need looks like this:
"A", "A", "A", "AN", "AN", "ANA", "ANA", "ANAN", "ANANA"
I tried this: re.findall(r"([AIEOU]+\w*)", "BANANA")
but it only finds "ANANA"
which seems to be the longest match.
How can I find all the other possible substrings?
Upvotes: 9
Views: 3184
Reputation: 107347
A more pythonic way:
>>> def grouper(s):
... return [s[i:i+j] for j in range(1,len(s)+1) for i in range(len(s)-j+1)]
...
>>> vowels = {'A', 'I', 'O', 'U', 'E', 'a', 'i', 'o', 'u', 'e'}
>>> [t for t in grouper(s) if t[0] in vowels]
['A', 'A', 'A', 'AN', 'AN', 'ANA', 'ANA', 'ANAN', 'ANANA']
Benchmark with accepted answer:
from timeit import timeit
s1 = """
sorted(s[i:j] for i, x in enumerate(s) for j in range(i + 1, len(s) + 1) if x in vowels)
"""
s2 = """
def grouper(s):
return [s[i:i+j] for j in range(1,len(s)+1) for i in range(len(s)-j+1)]
[t for t in grouper(s) if t[0] in vowels]
"""
print '1st: ', timeit(stmt=s1,
number=1000000,
setup="vowels = 'AIEOU'; s = 'BANANA'")
print '2nd : ', timeit(stmt=s2,
number=1000000,
setup="vowels = {'A', 'I', 'O', 'U', 'E', 'a', 'i', 'o', 'u', 'e'}; s = 'BANANA'")
result :
1st: 6.08756995201
2nd : 5.25555992126
Upvotes: 4
Reputation: 13539
This is a simple way of doing it. Sure there's an easier way though.
def subs(txt, startswith):
for i in xrange(len(txt)):
for j in xrange(1, len(txt) - i + 1):
if txt[i].lower() in startswith.lower():
yield txt[i:i + j]
s = 'BANANA'
vowels = 'AEIOU'
print sorted(subs(s, vowels))
Upvotes: 6
Reputation: 1753
As already mentioned in the comments, Regex would not be the right way to go about this.
Try this
def get_substr(string):
holder = []
for ix, elem in enumerate(string):
if elem.lower() in "aeiou":
for r in range(len(string[ix:])):
holder.append(string[ix:ix+r+1])
return holder
print get_substr("BANANA")
## ['A', 'AN', 'ANA', 'ANAN', 'ANANA', 'A', 'AN', 'ANA', 'A']
Upvotes: 3
Reputation: 336
s="BANANA"
vowels = 'AIEOU'
sorted(s[i:j] for i, x in enumerate(s) for j in range(i + 1, len(s) + 1) if x in vowels)
Upvotes: 13