Reputation: 12234

Algorithm to generate a Turing Machine from a Regular Expression

I'm developing a software to generate a Turing Machine from a regular expression. In other words, I want to take a regular expression as input, and programmatically generate a Turing Machine to perform the same task.

This is what I have done:

I've modelled the regular expression as follows:

RegularExpression (interface): the classes below implements this interface
Simple (ie: aaa, bbb, abcde): this is a leaf class; it does not have any subexpressions
ComplexWithoutOr (ie: a(ab)*, (a(ab)*c(b)*)*): this class contains a list of RegularExpression.
ComplexWithOr (exampe: a(a|b)*, (a((ab)*|c(b)*)*): this class contains an Or operation, which contains a list of RegularExpression. It represents the a|b part of the first example and the (ab)*|c(b)* of the second one.
Variable (example: awcw, where w ∈ {a,b}*): this is not yet implemented, but the idea is to model it as a leaf class with some different logic from Simple. It represents the w part of the examples.

When it comes to Turing machine (TM) generation, I have different levels of complexity:

Simple: this type of expression is already working. Generates a new state for each letter and moves right. If in any state, the letter read is not the expected, it starts a "rollback circuit" that finishes with the TM head in the initial position and stops in a not final state.

ComplexWithoutOr: here comes my problem. The algorithm generates a TM for each subexpression and concatenates them. This works for some simple cases, but I have problems with the rollback mechanism.

Problem description

Here is an example input for which my algorithm fails:

(ab)*abac: this is a ComplexWithoutOr expression that contains a ComplexWithOr expression (ab)* (that has a Simple expression inside: ab) and a Simple expression abac

My algorithm generates first an TM₁ for ab. This TM₁ is used by TM₂ for (ab)*, so if TM₁ succeeds, it enters again in TM₁, otherwise TM₁ rolls back and TM₂ finishes with success. In other words, TM₂ cannot fail.

Then, it generates a TM₃ for abac. The output of TM₂ is the input of TM₃. The output of TM₃ is the result of the execution.

Now, suppose this input string: abac. As you can see it matches with the regular expression. So let's see what happens when the TM is executed.

TM₁ executes with success the first time: ab. TM₁ fails the second time for ac and rolls back, putting the TM head in the 3^rd position at a. TM₂ finishes with success and the input is forwarded to TM₃. TM₃ fails, because ac doesn't match abac. So the TM does not recognize abac.

This is a problem. How to solve this?

I'm using Java to develop it, but the language it is not important. My question concerns the algorithm.

Upvotes: 3

Answers (3)

Confusion

Reputation: 16861

Michael Sipser, in Introduction to the Theory of Computation, proves in chapter 1 that regular expressions are equivalent to finite automata in their descriptive power. Part of the proof involves constructing a nondeterministic finite automaton (NDFA) that recognizes the language described by a specific regular expression. I'm not about to copy half that chapter, which would be quite hard due to the notation used, so I suggest you borrow or purchase the book (or perhaps a Google search using these terms will turn up a similar proof) and use that proof as the basis for your algorithm.

As Turing machines can simulate an NDFA, I assume an algorithm to produce an NDFA is good enough.

Upvotes: 1

MAK

Reputation: 26586

It is not entirely clear to me what exactly you are trying to implement. It looks like you want to make a Turing Machine (or any FSM in general) that accepts only those strings that are also accepted by the regular expression. In effect, you want to convert a regular expression to a FSM.

Actually that is exactly what a real regex matcher does under the hood. I think this series of articles by Russ Cox covers a lot of what you want to do.

Upvotes: 2

guest

Reputation: 435

in the chomsky hierarchy a regex is Level3, whereas a TM is Level1. this means, that a TM can produce any regex, but not vice versa.

Upvotes: -1

Algorithm to generate a Turing Machine from a Regular Expression

Problem description

Answers (3)

Related Questions