Extract matching string from start

Using Python 2.7

Lets say I have a pattern string abb and a search string abab. I want to get the sliced part of the pattern string that matches as much as possible with the search string from the beginning, i.e., I want the expression to return just ab because only that much of the pattern string is at the beginning of the search string. I have been through the google page on regex and the regex howto page as well but couldn't find a way. Just a hint would be enough.

I have written the following code which gives the right answer, but I'm looking for something more efficient.

pat_str='abb'
search_str='abab'
pat_length=len(pat_str)
for each in xrange(pat_length,0,-1):
    if re.search('^'+pat_str[:each],search_str):
        return_str=pat_str[:each]
        break

print return_str

Edit: Corresponding elements should be same. Break at the first instance where the corresponding elements aren't same and return the preceding string.

Upvotes: 0

Answers (4)

betterworld

Reputation: 264

The regular expression that you are describing is

(a(b(a(b?)?)?)?

You may construct it dynamically from your pattern:

import re
pattern = 'abab'
search_str = 'abb'

# Construct the regexp that I mentioned above
regexp = ''
for c in reversed(pattern):
    regexp = '(%s%s)?' % (re.escape(c), regexp)

m = re.match(regexp, search_str)
print 'Result is %r' % m.group(0)

This is assuming you really want to use regular expressions. The other answers have good solutions without re.

Also try googling for "longest common string prefix".

Update: The regexp was wrong (kind of reversed), fixed it.

Upvotes: 1

Padraic Cunningham

Reputation: 180441

Considering you only want to find a match at the start, just zip, compare and return.

from itertools import izip

def sub_match(s, sub):
    out = ""
    for a, b in izip(s, sub): # zip python3
        if a != b:
            return out
        out += a
    return out

If you want to find any matches in sub:

from itertools import islice,  izip
def sub_match(s, sub):
    all_m = []
    for i in range(len(sub)):
        out = ""
        for a, b in izip(s, islice(sub,i,None)):
            if a != b:
                all_m.append(out)
                break
            out += a
        else:
            all_m.append(out)
    return max(all_m, key=len)

Output:

In [12]: s = "csrababc"

In [13]: p =  "ababc"

In [14]: sub_match(s,p)
Out[14]: 'c'

In [15]: s = "ababdc"

In [16]: sub_match(s,p)
Out[16]: 'abab'

Upvotes: 1

Martin Evans

Reputation: 46759

How about the following:

import itertools

pattern = "ab"
search_string = "abab"

print "".join([x for x,y in itertools.takewhile(lambda p: p[0] == p[1], itertools.izip(pattern, search_string))])

Upvotes: 0

Cyphase

Reputation: 12022

I'm not sure if you can do this with a regular expression (not in a way that makes sense), but you don't need to; it's simple to implement in a function that works for any number of strings:

def longest_match(*strings):
    match = []
    for tup in zip(*strings):
        if len(set(tup)) == 1:
            match.append(tup[0])
        else:
            break

    return ''.join(match)

print(longest_match('abc123', 'abc456'))  # abc
print(longest_match('abc123', 'abc456', 'abyz'))  # ab
print(longest_match('ababc', 'csrabab'))  # <prints empty line>

Upvotes: 2

Extract matching string from start

Answers (4)

Related Questions