Johannes Schaub - litb
Johannes Schaub - litb

Reputation: 506877

Regular expression for string of digits with no repeated digits?

I'm reading through the dragon book and trying to solve an exercise that is stated as follows

Write regular definitions for the following languages:

  • All strings of digits with no repeated digits. Hint: Try this problem first with a few digits, such as { 0, 1, 2 }.

Despite having tried to solve it for hours, I can't imagine a solution, beside the extremely wordy

d0 -> 0?
d1 -> 1?
d2 -> 2?
d3 -> 3?
d4 -> 4?
d5 -> 5?
d6 -> 6?
d7 -> 7?
d8 -> 8?
d9 -> 9?
d10 -> d0d1d2d3d4d5d6d7d8d9 | d0d1d2d3d4d5d6d7d9d8 | ...

Hence having to write 10! alternatives in d10. Since we shall write this regular definition, I doubt that this is a proper solution. Can you help me please?

Upvotes: 19

Views: 14298

Answers (7)

Zhichuang Sun
Zhichuang Sun

Reputation: 1

I don't think there is a neat way to write a regular expression to solve this problem without listing all the possibilities. But I find a way to reduce the complexity from O(N!) to O(2^N) by defining the DFA in the following way. In the DFA I'm going to construct, a state represent whether any digit has appeared or not.

Take strings consisting of {0, 1, 2} for example, 0 represent '0' has appeared once, 0' represent '0' hasn't appeared. All the states will look like this {012, 0'1'2', 0'12, 01'2, 012',012', 01'2, 0'12}. There will be 2^3=8 states at all. And the DFA looks like as follows: DFA for strings with no repeating digits

You can easily extend it to {0,1,2,...,9}. But there will be 1024 states at all. However, I think it's the most compact DFA with an intuitive proof. For the reason that each state has unique meaning and can't be merged further.

Upvotes: 0

riwalk
riwalk

Reputation: 14223

So the question didn't necessarily ask you to write a regular expression, it asked you to provide a regular definition, which I interpret to include NFA's. It turns out it doesn't matter which you use, as all NFA's can be shown to be mathematically equivalent to regular expressions.

Using the digits 0, 1, and 2, a valid NFA would be the following (sorry for the crummy diagram):

enter image description here

Each state represents the last digit scanned in the input, and there are no loops on any of the nodes, therefore this is an accurate representation of a string with no repeated digits from the set {0,1,2}. Extending this is trivial (although it requires a large whiteboard :) ).

NOTE: I am making the assumption that the string "0102" IS valid, but the string "0012" is not.

This can be converted to a regular expression (although it will be painful) by using the algorithm described here.

Upvotes: 10

user515430
user515430

Reputation: 3351

A regular definition is a sequence of definitions on the form

d1 -> r1

d2 -> r2

...

dn -> rn

Now make the following definitions:

Zero -> 0

One -> Zero (1 Zero)* | (Zero 1)+ | 1 (Zero 1)* | (1 Zero)+

Two -> One (2 One)* | (One 2)+ | 2 (One 2)* | (2 One)+

Three -> Two (3 Two)* | (Two 3)+ | 3 (Two 3)* | (3 Two)+

Four -> Three (4 Three)* | (Three 4)+ | 4 (Three 4)* | (4 Three)+

...

Nine -> Eight (9 Eight)* | (Eight 9)+ | 9 (Eight 9)* | (9 Eight)+

Upvotes: 1

ridgerunner
ridgerunner

Reputation: 34385

Not sure what you mean by "Regular Expression" in your question title. But if the regex engine supports negative lookahead, this is easily accomplished. (Here's a PHP snippet)

$re = '/# Match string of digits having no repeated digits.
    ^                 # Anchor to start of string.
    (?![^0]*0[^0]*0)  # Assert 0 does not occur twice.
    (?![^1]*1[^1]*1)  # Assert 1 does not occur twice.
    (?![^2]*2[^2]*2)  # Assert 2 does not occur twice.
    (?![^3]*3[^3]*3)  # Assert 3 does not occur twice.
    (?![^4]*4[^4]*4)  # Assert 4 does not occur twice.
    (?![^5]*5[^5]*5)  # Assert 5 does not occur twice.
    (?![^6]*6[^6]*6)  # Assert 6 does not occur twice.
    (?![^7]*7[^7]*7)  # Assert 7 does not occur twice.
    (?![^8]*8[^8]*8)  # Assert 8 does not occur twice.
    (?![^9]*9[^9]*9)  # Assert 9 does not occur twice.
    [0-9]+            # Match string of only digits.
    $                 # Anchor to end of string.
    /x';

Upvotes: 0

Guy Sirton
Guy Sirton

Reputation: 8401

Here's one possible construction:

  • A regex for a string that contains at the most a single '0' digit looks like (1-9)* (0|epsilon) (1-9)* - so any number of 1-9 digits, followed by zero or 1 '0's followed by any number of 1-9 digits.
  • We can now move forward by noticing that if there's only a single '1' digit it will be either to the left or to the right of the '0' digit (or the epsilon representing the missing zero digit). So we can construct a regular expression having these two cases or'ed (|) together.
  • We can now drill further down saying that if there's only a single '2' digit it can be to the right or the left of the 1 digit in it's two possible relative locations to the '0' digit.
  • So we're building a binary tree and the number of ORed regex is on the order of 2^10 which is the same order of the FSM accepting this language. An FSM for accepting the language should have (2^10 + 1) states with each state n can be seen as it's binary representation n0n1n2n3n4n5n6n7n8n9 meaning n0 = seen digit '0', n1 = seen digit '1'. and a repeat digit transitioning to the single non-accepting state. The initial state being zero.

If you're allowed to complement then a regex that has more than a single '0' digit would be (0-9)* 0 (0-9)* 0 (0-9)*, repeat for all digits, complement.

You can definitely be a lot more compact for Peter Taylors interpretation of no two consecutive digits that are the same. Clearly the state for that problem is much smaller.

SUCCINCTNESS OF THE COMPLEMENT AND INTERSECTION OF REGULAR EXPRESSIONS

"A study in [2] reveals that most of the one-unambiguous regular expression used in practice take a very simple form: every alphabet symbol occurs at most once. We refer to those as single-occurrence regular expressions (SOREs) and show a tight exponential lower bound for intersection."

...

"In this section, we show that in defining the complement of a single regular expression, a double-exponential size increase cannot be avoided in general. In contrast, when the expression is one-unambiguous its complement can be computed in polynomial time."

Upvotes: 3

blubb
blubb

Reputation: 9890

(I don't know which variant of regular expressions you are referring to, if any, thus I will provide hints for the most general form of regular expressions.)

I find it a rather odd application of regular expressions since this is exactly one of the cases where they don't really provide a big benefit over other (more trivial to understand) solutions.

However, if you absolutely want to use regex, here's a hint (no solution since it's an exercise, let me know if you need more hints):

Regex allows you to recognize regular languages, which are generally accepted by deterministic finite state machines. Try to find a state machine which accepts exactly the words in the specified pattern. It'll require 2^10 = 1024 states but not 10! = 3628800.

Upvotes: 1

Scott Whitlock
Scott Whitlock

Reputation: 13839

Instead of trying to write a definition that only defines what you want, what if you tell it to generate a list of all strings up digits up to 10 digits in length, including duplicates, and then subtract the ones that contain two zeros, two ones... etc.? Would that work?

Upvotes: 2

Related Questions