Reputation: 7977
Using Python 2.7
Lets say I have a pattern string abb
and a search string abab
.
I want to get the sliced part of the pattern string that matches as much as possible with the search string from the beginning, i.e., I want the expression to return just ab
because only that much of the pattern string is at the beginning of the search string. I have been through the google page on regex and the regex howto page as well but couldn't find a way. Just a hint would be enough.
I have written the following code which gives the right answer, but I'm looking for something more efficient.
pat_str='abb'
search_str='abab'
pat_length=len(pat_str)
for each in xrange(pat_length,0,-1):
if re.search('^'+pat_str[:each],search_str):
return_str=pat_str[:each]
break
print return_str
Edit: Corresponding elements should be same. Break at the first instance where the corresponding elements aren't same and return the preceding string.
Upvotes: 0
Views: 128
Reputation: 264
The regular expression that you are describing is
(a(b(a(b?)?)?)?
You may construct it dynamically from your pattern:
import re
pattern = 'abab'
search_str = 'abb'
# Construct the regexp that I mentioned above
regexp = ''
for c in reversed(pattern):
regexp = '(%s%s)?' % (re.escape(c), regexp)
m = re.match(regexp, search_str)
print 'Result is %r' % m.group(0)
This is assuming you really want to use regular expressions. The other answers have good solutions without re.
Also try googling for "longest common string prefix".
Update: The regexp was wrong (kind of reversed), fixed it.
Upvotes: 1
Reputation: 180441
Considering you only want to find a match at the start, just zip, compare and return.
from itertools import izip
def sub_match(s, sub):
out = ""
for a, b in izip(s, sub): # zip python3
if a != b:
return out
out += a
return out
If you want to find any matches in sub:
from itertools import islice, izip
def sub_match(s, sub):
all_m = []
for i in range(len(sub)):
out = ""
for a, b in izip(s, islice(sub,i,None)):
if a != b:
all_m.append(out)
break
out += a
else:
all_m.append(out)
return max(all_m, key=len)
Output:
In [12]: s = "csrababc"
In [13]: p = "ababc"
In [14]: sub_match(s,p)
Out[14]: 'c'
In [15]: s = "ababdc"
In [16]: sub_match(s,p)
Out[16]: 'abab'
Upvotes: 1
Reputation: 46759
How about the following:
import itertools
pattern = "ab"
search_string = "abab"
print "".join([x for x,y in itertools.takewhile(lambda p: p[0] == p[1], itertools.izip(pattern, search_string))])
Upvotes: 0
Reputation: 12022
I'm not sure if you can do this with a regular expression (not in a way that makes sense), but you don't need to; it's simple to implement in a function that works for any number of strings:
def longest_match(*strings):
match = []
for tup in zip(*strings):
if len(set(tup)) == 1:
match.append(tup[0])
else:
break
return ''.join(match)
print(longest_match('abc123', 'abc456')) # abc
print(longest_match('abc123', 'abc456', 'abyz')) # ab
print(longest_match('ababc', 'csrabab')) # <prints empty line>
Upvotes: 2