Reputation: 506877
I'm reading through the dragon book and trying to solve an exercise that is stated as follows
Write regular definitions for the following languages:
- All strings of digits with no repeated digits. Hint: Try this problem first with a few digits, such as { 0, 1, 2 }.
Despite having tried to solve it for hours, I can't imagine a solution, beside the extremely wordy
d0 -> 0?
d1 -> 1?
d2 -> 2?
d3 -> 3?
d4 -> 4?
d5 -> 5?
d6 -> 6?
d7 -> 7?
d8 -> 8?
d9 -> 9?
d10 -> d0d1d2d3d4d5d6d7d8d9 | d0d1d2d3d4d5d6d7d9d8 | ...
Hence having to write 10!
alternatives in d10
. Since we shall write this regular definition, I doubt that this is a proper solution. Can you help me please?
Upvotes: 19
Views: 14298
Reputation: 1
I don't think there is a neat way to write a regular expression to solve this problem without listing all the possibilities. But I find a way to reduce the complexity from O(N!) to O(2^N) by defining the DFA in the following way. In the DFA I'm going to construct, a state represent whether any digit has appeared or not.
Take strings consisting of {0, 1, 2} for example, 0 represent '0' has appeared once, 0' represent '0' hasn't appeared. All the states will look like this {012, 0'1'2', 0'12, 01'2, 012',012', 01'2, 0'12}. There will be 2^3=8 states at all. And the DFA looks like as follows: DFA for strings with no repeating digits
You can easily extend it to {0,1,2,...,9}. But there will be 1024 states at all. However, I think it's the most compact DFA with an intuitive proof. For the reason that each state has unique meaning and can't be merged further.
Upvotes: 0
Reputation: 14223
So the question didn't necessarily ask you to write a regular expression, it asked you to provide a regular definition, which I interpret to include NFA's. It turns out it doesn't matter which you use, as all NFA's can be shown to be mathematically equivalent to regular expressions.
Using the digits 0, 1, and 2, a valid NFA would be the following (sorry for the crummy diagram):
Each state represents the last digit scanned in the input, and there are no loops on any of the nodes, therefore this is an accurate representation of a string with no repeated digits from the set {0,1,2}. Extending this is trivial (although it requires a large whiteboard :) ).
NOTE: I am making the assumption that the string "0102" IS valid, but the string "0012" is not.
This can be converted to a regular expression (although it will be painful) by using the algorithm described here.
Upvotes: 10
Reputation: 3351
A regular definition is a sequence of definitions on the form
d1 -> r1
d2 -> r2
...
dn -> rn
Now make the following definitions:
Zero -> 0
One -> Zero (1 Zero)* | (Zero 1)+ | 1 (Zero 1)* | (1 Zero)+
Two -> One (2 One)* | (One 2)+ | 2 (One 2)* | (2 One)+
Three -> Two (3 Two)* | (Two 3)+ | 3 (Two 3)* | (3 Two)+
Four -> Three (4 Three)* | (Three 4)+ | 4 (Three 4)* | (4 Three)+
...
Nine -> Eight (9 Eight)* | (Eight 9)+ | 9 (Eight 9)* | (9 Eight)+
Upvotes: 1
Reputation: 34385
Not sure what you mean by "Regular Expression" in your question title. But if the regex engine supports negative lookahead, this is easily accomplished. (Here's a PHP snippet)
$re = '/# Match string of digits having no repeated digits.
^ # Anchor to start of string.
(?![^0]*0[^0]*0) # Assert 0 does not occur twice.
(?![^1]*1[^1]*1) # Assert 1 does not occur twice.
(?![^2]*2[^2]*2) # Assert 2 does not occur twice.
(?![^3]*3[^3]*3) # Assert 3 does not occur twice.
(?![^4]*4[^4]*4) # Assert 4 does not occur twice.
(?![^5]*5[^5]*5) # Assert 5 does not occur twice.
(?![^6]*6[^6]*6) # Assert 6 does not occur twice.
(?![^7]*7[^7]*7) # Assert 7 does not occur twice.
(?![^8]*8[^8]*8) # Assert 8 does not occur twice.
(?![^9]*9[^9]*9) # Assert 9 does not occur twice.
[0-9]+ # Match string of only digits.
$ # Anchor to end of string.
/x';
Upvotes: 0
Reputation: 8401
Here's one possible construction:
If you're allowed to complement then a regex that has more than a single '0' digit would be (0-9)* 0 (0-9)* 0 (0-9)*, repeat for all digits, complement.
You can definitely be a lot more compact for Peter Taylors interpretation of no two consecutive digits that are the same. Clearly the state for that problem is much smaller.
SUCCINCTNESS OF THE COMPLEMENT AND INTERSECTION OF REGULAR EXPRESSIONS
"A study in [2] reveals that most of the one-unambiguous regular expression used in practice take a very simple form: every alphabet symbol occurs at most once. We refer to those as single-occurrence regular expressions (SOREs) and show a tight exponential lower bound for intersection."
...
"In this section, we show that in defining the complement of a single regular expression, a double-exponential size increase cannot be avoided in general. In contrast, when the expression is one-unambiguous its complement can be computed in polynomial time."
Upvotes: 3
Reputation: 9890
(I don't know which variant of regular expressions you are referring to, if any, thus I will provide hints for the most general form of regular expressions.)
I find it a rather odd application of regular expressions since this is exactly one of the cases where they don't really provide a big benefit over other (more trivial to understand) solutions.
However, if you absolutely want to use regex, here's a hint (no solution since it's an exercise, let me know if you need more hints):
Regex allows you to recognize regular languages, which are generally accepted by deterministic finite state machines. Try to find a state machine which accepts exactly the words in the specified pattern. It'll require 2^10 = 1024
states but not 10! = 3628800
.
Upvotes: 1
Reputation: 13839
Instead of trying to write a definition that only defines what you want, what if you tell it to generate a list of all strings up digits up to 10 digits in length, including duplicates, and then subtract the ones that contain two zeros, two ones... etc.? Would that work?
Upvotes: 2