Reputation: 4133
Can you please suggest if it possible and how to define regular expression for the next problem:
I have 2 'binary' strings joined by a delimiter. Let's say the delimiter is always 3.
Example of input data:
11000031000111011001101111
0111000301000111011001101111
001310001110110011011110
On the right part of the delimiter, I need to select all '1+' blobs, but I don't want to select 1+ blobs with length equal to 1+ blobs on the left part of the delimiter.
So for example on this input:
11000031000111011001101111
There are these 1+ blobs:
left: 11
right: 1, 111, 11, 11, 1111
I would need to select all right 1+ blobs, but the '11', since those are also present on the left.
So on that example I would need to match: 1,111,1111
Other examples:
0111000301000111011001101111 ==> 1,11,11,1111
001310001110110011011110 ==> 111,11,11,1111
I also need to have location of each string (Match Object). It is important to understand complexity of this task and if it is possible to construct FSM. What if limit the size of left part (for example < 8)
Upvotes: 0
Views: 194
Reputation: 5308
This could be doable with regexes (which are more powerful than regular languages, as stated on the comments). But for that we would need the regex engine to support varying length look-behinds.
Unfortunately, most regex engines only support fixed-length look-behinds.
However, one could 'reverse' the original string so that look-behinds would become look-aheads (which can be varying-length)
This regex should work for reversed strings:
(?<!1)(1+)(?!1)(?=\d*3)(?!.*3\d*(?<!1)\1(?!1))
You have a demo here.
Bear in mind that on the demo the strings have been reversed.
This is how the regex works:
(?<!1)(1+)(?!1) # a blob of ones, capture it. Cannot match slices of the blob
(?=\d*3) # must be followed by some '3'
(?! # must not be followed by:
.* # any prefix
3 # then a '3'
\d* # then some numbers
(?<!1)\1(?!1) # and then a blob of ones equal to the captured one
)
Or course, this may be out of the 'spirit' of the original question, since it requires to reverse the input string.
However, I think it is still valuable because in theory one could make a similar regex without reversing the string, if using a regex engine that supports varying length look behinds.
Upvotes: 1
Reputation: 8010
Since you specifically ask not for only a regular expression, but a finite state machine, I can specifically prove the impossibility. In a finite state machine, you must have some finite number n
states. We need a state for every length of sequence of 1
s found on the left side of the 3. Since we can have infinitely many possible lengths of sequences of 1
s, it is impossible to encode this information in finitely many states.
If you specifically limit the size of the left part, you effectively limit the maximum size of a sequence of 1
s. This makes it possible, though still not practical.
Suppose the size on the left is n
. You would need a state for every possible combination of lengths of 1 that appear, which is O(2^n)
. Good luck writing a regular expression for that. You would probably effectively end up enumerating most possible combinations and use very little of the actual state machine logic.
Example state machine for n=4
: Note more than 16 states, and this is only the left side. Each "accepting" state here would need more complex logic to match the right side.
Upvotes: 2