Derek
Derek

Reputation: 11915

Minimal DFA for whole-words in regex

In creating a DFA for a regex, I have noticed that whole-words add to the number of states, even though analytically, they look similar to regex with fewer states.

For example, to me, (a|b)+ looks the same as (hello|world)+

If I had a matching string, it would be fairly easily to find/replace "hello" with "a" and "world" with "b" and viceversa. So my question is, why aren't "hello" and "world" counted as single states?

Upvotes: 0

Views: 59

Answers (1)

Scott Hunter
Scott Hunter

Reputation: 49813

Because DFAs are mind-numbingly simple to implement with the simpler definition of states, at the cost of having more states. What you suggest is fine for describing how you want a DFA to work, and have a straight-forward correspondence to the traditional DFAs. But it doesn't allow you to say anything more.

It is similar to the use of NFAs: they're easier to design and (maybe) think about, but have no more power, and there is a well-defined algorithm to translate them into DFAs (again, at the cost of introducing states).

Think of DFAs using single-character transitions as the "machine language" of regular expressions (which are NOT the same thing as regex, to get pedantic).

Upvotes: 1

Related Questions