Barun
Barun

Reputation: 1915

Regex checking repeat

I am trying to check a text line using regex.

1,3,4,5,8,10,12,14,19,14

Here numbers are delimited with ',' and should be non-negetive and less than or equals to 20. And also any number should not be repeated. Here is my pattern.

^(?:(?:0[1-9]|[1-9]|1[0-9]|20),)*(?:0[1-9]|[1-9]|1[0-9]|20)$

But it cant check the repeat. How can I check it ?

Upvotes: 10

Views: 1508

Answers (5)

stema
stema

Reputation: 93086

What you want to do is not that complicated. You just need to check after each matched number if this number occurs once more in the string:

^(?:(0[1-9]|[1-9]|1[0-9]|20),(?!.*\b\1\b))*(?:0[1-9]|[1-9]|1[0-9]|20)$

See it and test it here on Regexr.

In C#:

string[] myStrings = { "1",
    "1,2",
    "01,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20",
    "01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20",
    "01,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,5",
    "01,02,03,04,05,06,07,08,13,09,10,11,12,13,14,15,16,17,18,19,20" };

Regex reg = new Regex(
    @"^
        (?:(0[1-9]|[1-9]|1[0-9]|20),
            (?!.*\b\1\b) # Fail if the before matched number occurs once more
        )*
        (?:0[1-9]|[1-9]|1[0-9]|20)
    $",
    RegexOptions.IgnorePatternWhitespace
);

foreach (string myString in myStrings)
    Console.WriteLine("{0} {1} a valid string.",
        myString,
        reg.IsMatch(myString) ? "is" : "is not"
    );

Console.ReadLine();

Upvotes: 7

poke
poke

Reputation: 388403

As you have tagged your question with both C# and Java, I’m not going to give you a code solution here, but the basic idea.

If you split the string by ,, you get a list of substrings: "1", "3" , "4", "5", "8", "10", "12", "14", "19", "14". Now, you could just loop over those and try to parse each as an integer. If it fails, it was not a number. And if it succeeds, you can easily check if it is < 0 or > 20. And you can also keep a set of numbers you already had before and check if the current one was repeated.

The bottom line is, that you shouldn’t try to use regular expressions for everything. And your language requirement is not regular anyway (if you need to remember stuff, or count things, it is usually not regular). Perl based RegExps are capable of a bit more than just regular but it is not enough here.

Solution as regular expression

As you said in the comments, one line is limited to hold at most 20 numbers. As each number is also limited to be between zero and twenty, you have a finite amount of possibilities for how the line can actually look. As such, you have a finite language (with a finite number of possible lines). Finite languages are a subset of regular languages and as such, you can “easily” represent the language with regular expressions.

The simplest solution would be to just list every possible line. So, if you had just 3 numbers per line with 5 being the highest number (just to keep things simple), the regular expression could look like this:

0,1,2|0,1,3|0,1,4|0,1,5|0,2,3|0,2,4|0,2,5|0,3,4|0,3,5|0,4,5|1,2,3|1,2,4|1,2,5|1,3,4|1,3,5|1,4,5|2,3,4

Of course, you could simplify that a lot then (maybe even more):

0,(1,(2|3|4|5)|2,(3|4|5)|3,(4|5)|4,5)|1,(2,(3|4|5)|3,(4|5)|4,5)|2,(3,(4|5)|4,5)|3,4,5

But yeah, if you have a requirements that make a language finite, it also gets regular, but not necessarily pretty; and I would argue that the “manual” solution is still a lot more readable and especially more flexible.

Upvotes: 4

Adam Dymitruk
Adam Dymitruk

Reputation: 129782

Since you want regex, yes, you will be limited with back references as they only go from \1 to \9. So you need to exclude pairings. Your greatest challenge is to get rid of the repeating numbers.

from http://www.regular-expressions.info/refadv.html

use (?:(\d?\d),?)+ with (?!<regex>) to ensure you have no duplicates. You can also use (?(?=<regex>)true|false)

I used this page to experiment: http://www.regextester.com/

Upvotes: 0

asgoth
asgoth

Reputation: 35829

String[] numbers = input.split(",");
Set<Integer> filtered = new TreeSet();

for(String number: numbers) {
   if(!number.startsWith("-") {
      int nbr = Integer.parseInt(number);

      if(nbr < 20) {
         filtered.add(nbr);
      }
   }
}
for(int nbr: filtered) {
   System.out.print(nbr + " ");
}

Upvotes: 1

Adam Dymitruk
Adam Dymitruk

Reputation: 129782

Regex is not the best option for this. It gets too hairy fast for repeating numbers. You might want to look at tokenization. Even simple things like looking for a pattern that is NOT present is difficult (see Regular expression to match a line that doesn't contain a word? for an example)

I would split the string by commmas, then add them to an ordered list. If using C#:

"1,2,3,4".Split(',')

to start then continue with Linq to see if your conditions are satisfied.

If you MUST do this with regex, look at iterating over the collection search returns. But this buys you very little over the solution above.

Upvotes: 2

Related Questions