Reputation: 12234
I'm developing a software to generate a Turing Machine from a regular expression. In other words, I want to take a regular expression as input, and programmatically generate a Turing Machine to perform the same task.
This is what I have done:
I've modelled the regular expression as follows:
RegularExpression
(interface): the classes below implements this interface
Simple
(ie: aaa
, bbb
, abcde
): this is a leaf class; it does not have any subexpressions
ComplexWithoutOr
(ie: a(ab)*
, (a(ab)*c(b)*)*
): this class contains a list of RegularExpression
.
ComplexWithOr
(exampe: a(a|b)*
, (a((ab)*|c(b)*)*
): this class contains an Or
operation, which contains a list of RegularExpression
. It represents the a|b
part of the first example and the (ab)*|c(b)*
of the second one.
Variable
(example: awcw
, where w ∈ {a,b}*): this is not yet implemented, but the idea is to model it as a leaf class with some different logic from Simple
. It represents the w
part of the examples.
When it comes to Turing machine (TM) generation, I have different levels of complexity:
Simple
: this type of expression is already working. Generates a new state for each letter and moves right. If in any state, the letter read is not the expected, it starts a "rollback circuit" that finishes with the TM head in the initial position and stops in a not final state.
ComplexWithoutOr
: here comes my problem. The algorithm generates a TM for each subexpression and concatenates them. This works for some simple cases, but I have problems with the rollback mechanism.
Here is an example input for which my algorithm fails:
(ab)*abac
: this is a ComplexWithoutOr
expression that contains a ComplexWithOr
expression (ab)*
(that has a Simple
expression inside: ab
) and a Simple
expression abac
My algorithm generates first an TM1 for ab
. This TM1 is used by TM2 for (ab)*
, so if TM1 succeeds, it enters again in TM1, otherwise TM1 rolls back and TM2 finishes with success. In other words, TM2 cannot fail.
Then, it generates a TM3 for abac
. The output of TM2 is the input of TM3. The output of TM3 is the result of the execution.
Now, suppose this input string: abac
. As you can see it matches with the regular expression. So let's see what happens when the TM is executed.
TM1 executes with success the first time: ab
. TM1 fails the second time for ac
and rolls back, putting the TM head in the 3rd position at a
. TM2 finishes with success and the input is forwarded to TM3. TM3 fails, because ac
doesn't match abac
. So the TM does not recognize abac
.
This is a problem. How to solve this?
I'm using Java to develop it, but the language it is not important. My question concerns the algorithm.
Upvotes: 3
Views: 4982
Reputation: 16861
Michael Sipser, in Introduction to the Theory of Computation, proves in chapter 1 that regular expressions are equivalent to finite automata in their descriptive power. Part of the proof involves constructing a nondeterministic finite automaton (NDFA) that recognizes the language described by a specific regular expression. I'm not about to copy half that chapter, which would be quite hard due to the notation used, so I suggest you borrow or purchase the book (or perhaps a Google search using these terms will turn up a similar proof) and use that proof as the basis for your algorithm.
As Turing machines can simulate an NDFA, I assume an algorithm to produce an NDFA is good enough.
Upvotes: 1
Reputation: 26586
It is not entirely clear to me what exactly you are trying to implement. It looks like you want to make a Turing Machine (or any FSM in general) that accepts only those strings that are also accepted by the regular expression. In effect, you want to convert a regular expression to a FSM.
Actually that is exactly what a real regex matcher does under the hood. I think this series of articles by Russ Cox covers a lot of what you want to do.
Upvotes: 2
Reputation: 435
in the chomsky hierarchy a regex is Level3, whereas a TM is Level1. this means, that a TM can produce any regex, but not vice versa.
Upvotes: -1