user2870222
user2870222

Reputation: 269

Mark the shortest overlapping match using regular expressions

This post shows how to find the shortest overlapping match using regex. One of the answers shows how to get the shortest match, but I am struggling with how to locate the shortest match and mark its position, or substitute it with another string.

So in the given pattern,

A|B|A|F|B|C|D|E|F|G

and the pattern I want to locate is:

my_pattern = 'A.*?B.*?C'

How can I identify the shortest match and mark it in the original given pattern like below?

A|B|[A|F|B|C]|D|E|F|G

or substitute:

A|B|AAA|F|BBB|CCC|D|E|F|G

Upvotes: 5

Views: 117

Answers (3)

vks
vks

Reputation: 67968

(A[^A]*?B[^B]*?C)

You can use this simple regex.Replace by [\1].

See Demo

x="A|B|A|F|B|C|D|A|B|C" print re.sub("("+re.escape(min(re.findall(r"(A[^A]*?B[^B]*?C)",x),key=len))+")",r"[\1]",x)

Upvotes: 1

anubhava
anubhava

Reputation: 785196

One way is to use lookahead between A and B and then B and C like this:

import re
p = re.compile(ur'A(?:(?![AC]).)*B(?:(?![AB]).)*C')
test_str = u"A|B|A|F|B|C|D|E|F|G"
result = re.sub(p, u"[$0]", test_str)
# A|B|[A|F|B|C]|D|E|F|G

test_str = u"A|B|C|F|B|C|D|E|F|G"
result = re.sub(p, u"[$0]", test_str)
# [A|B|C]|F|B|C|D|E|F|G

RegEx Demo

Upvotes: 2

Kasravnd
Kasravnd

Reputation: 107287

I suggest to use Tim Pietzcker's answer with re.sub :

>>> p=re.findall(r'(?=(A.*?B.*?C))',s)
>>> re.sub(r'({})'.format(re.escape(min(p, key=len))),r'[\1]',s,re.DOTALL)
'A|B|[A|F|B|C]|D|E|F|G'

Upvotes: 2

Related Questions