Will
Will

Reputation: 4631

Python: Regular expression for exactly $n$ [?o], with at least one [o]

I'm trying to find the right python regex to solve this problem:

Given a string composed of the characters ?, _, and o, find a substring of length n that contains only ? and o and at least one o.

This is what I've come up with, but it doesn't seem to be working:

n = 3
r = re.compile("
  (?=[o?]{"+str(n)+","+str(n)+"})  # first find a block of n characters that are either 'o' or '?'
  [o?]*o[o?]*                      # then check if that block has at least one 'o'
, re.VERBOSE")

I think that the problem with the above is that the lookahead correctly finds a potential block, but then the [o?]*o[o?]* is greedily consuming characters beyond the limit of the block that was found by the first part. I was hoping that the lookahead would confine the subsequent expression to match within the bounds of what the lookahead matched, but I guess that's not how it works.

I'll probably end up doing this another way, since this is probably too much logic for regular expressions to be the best method, but I wanted to know how to do this in a single regex.

Upvotes: 2

Views: 118

Answers (2)

nneonneo
nneonneo

Reputation: 179602

You don't even need regex.

pieces = s.split('_') # pieces are composed of only ? and o
for piece in pieces:
    if 'o' in piece and len(piece) >= n: # piece must have a substring of length n with o in it somewhere
        print "found it"
        break
else:
    print "didn't find it"

Upvotes: 5

Explosion Pills
Explosion Pills

Reputation: 191789

You can't use the lookahead to restrict input as all it does is look ahead (obviously) for the input. If there's more input after what you specify in the lookahead, it can still be found. You can use the lookahead just to make sure that there is an o (since this is necessary) and make the regex simpler.

re.compile("(?=.{0," + str(n - 1) + "}o)[o?]{" + str(n) + "}")

Upvotes: 4

Related Questions