user529287
user529287

Reputation: 63

Can this be done in one regex?

I need a regex to match a string that:

Matches:

11
11111
1  1 1 1 1
1  1
11 1 1 1 1 1
1           1
1    1      1

No matches:

1             has only one digit
11111         has space at the end
 11111        has space at beginning
12            digits are different
11:           has other character

I know regex for each of my requirement. That way I'll use 4 regex tests. Can we do it in one regex?

Upvotes: 6

Views: 188

Answers (4)

Sayan Malakshinov
Sayan Malakshinov

Reputation: 8655

/^(\d)(\1| )*\1$/

Upvotes: 0

Nakilon
Nakilon

Reputation: 35084

^(\d)( *\1)+$


Upvotes: 1

codaddict
codaddict

Reputation: 455122

Yes it can be done in one regex:

^(\d)(?:\1| )*\1$

Rubular link

Explanation:

^      - Start anchor
(      - Start parenthesis for capturing
 \d    - A digit
)      - End parenthesis for capturing
(?:    - Start parenthesis for grouping only
\1     - Back reference referring to the digit capture before
|      - Or
       - A literal space
)      - End grouping parenthesis
*      - zero or more of previous match
\1     - The digit captured before
$      - End anchor

Upvotes: 14

tchrist
tchrist

Reputation: 80405

Consider this program:

#!/usr/bin/perl -l
$_ = "3 33 3 3";
print /^(\d)[\1 ]*\1$/      ? 1 : 0;
print /^(\d)(?:\1| )*\1$/   ? 1 : 0;

It produces the output

0
1

The answer is obvious when you look at the compiled regexes:

perl -c -Mre=debug /tmp/a
Compiling REx "^(\d)[\1 ]*\1$"
synthetic stclass "ANYOF[0-9][{unicode_all}]".
Final program:
   1: BOL (2)
   2: OPEN1 (4)
   4:   DIGIT (5)
   5: CLOSE1 (7)
   7: STAR (19)
   8:   ANYOF[\1 ][] (0)
  19: REF1 (21)
  21: EOL (22)
  22: END (0)
floating ""$ at 1..2147483647 (checking floating) stclass ANYOF[0-9][{unicode_all}] anchored(BOL) minlen 1 
Compiling REx "^(\d)(?:\1| )*\1$"
synthetic stclass "ANYOF[0-9][{unicode_all}]".
Final program:
   1: BOL (2)
   2: OPEN1 (4)
   4:   DIGIT (5)
   5: CLOSE1 (7)
   7: CURLYX[1] {0,32767} (17)
   9:   BRANCH (12)
  10:     REF1 (16)
  12:   BRANCH (FAIL)
  13:     EXACT < > (16)
  15:   TAIL (16)
  16: WHILEM[1/1] (0)
  17: NOTHING (18)
  18: REF1 (20)
  20: EOL (21)
  21: END (0)
floating ""$ at 1..2147483647 (checking floating) stclass ANYOF[0-9][{unicode_all}] anchored(BOL) minlen 1 
/tmp/a syntax OK
Freeing REx: "^(\d)[\1 ]*\1$"
Freeing REx: "^(\d)(?:\1| )*\1$"

Backrefs are just regular octal characters inside character classes!!

Upvotes: 2

Related Questions