PaulSchell
PaulSchell

Reputation: 182

Regex match certain IDs

I need help with some tricky regex to solve (for me!) and hope I can learn something to write some myself in the future.

I need to match all of the following IDs:

#1
#12
#123
#1234
#5069
#316&
#316.
#316;

and do not want to match leading zeros and numbers that end with ] or [ or are between ().

#0155
#0000155
#1123]
#1123[
(#1125)

I have come up with something like this: (#[1-9]\d{0,}), but it matches all of the above. So, I tried a different stuff like:

"(#[1-9]\\d{0,})([\\s,<\\.:&;\\)])"
"(#[1-9]+)([\\s,<\\.])"
"(?m)(#[1-9]+)(.,\(,\))"

But what I really want to do is (#[1-9]\d{0,}) to match all numbers BUT NOT FOLLOWING [ OR ] OR ( OR ).

How can I express something like this in a regex?

P.S.: The Regex needs to be used in Java.

Maybe someone can help to solve this, even better if he can explain how he got the way to the solution, so i can learn something new and help others when they struggle with the same problem.

kind regards!

Upvotes: 2

Views: 93

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626691

You can use the following solution:

#[1-9]\d*(?![\[\])])\b[&.;]?

See demo

REGEX:

  • # - Matches # literally
  • [1-9] - 1 digit from 1 to 9
  • \d* - 0 or more digits
  • (?![\[\])]) - A negative lookahead making sure there is no [, ] or ) after the digits
  • \b - A word boundary
  • [&.;]? - An optional (?) character group matching &, . or ; literally.

Sample code:

String str = "#1\n#12\n#123\n#1234\n#5069\n#316&\n#316.\n#316;\nand not matches (leading zeros) and numbers that end with ] or [ or are between ().\n\n#0155\n#0000155\n#1123]\n#1123[\n(#1125)";
String rx = "#[1-9]\\d*(?![\\[\\])])\\b[&.;]?";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
    System.out.println(m.group(0));
}

See IDEONE demo

UPDATE

You can achieve the expected results with atomic grouping that prevents the regex engine from backtracking into it.

String rx = "#(?>[1-9]\\d*)(?![\\[\\])])[^\\w&&[^\n]]?";

In plain words, the check for brackets will be performed only after the last digit matched. See updated demo.

The [^\\w&&[^\n]]? pattern optionally matches any non-alphanumeric character but a newline. The newline is excluded from the character class using a character class intersection technique.

Upvotes: 3

Buurman
Buurman

Reputation: 1984

I am not able to test this in Java at the moment, but how about

"^#[1-9][0-9]*[&.;]?$"

(Any string starting with a '#', then a character from 1-9, then zero or more characters from 0-9, then a '&', '.' or ';' or nothing, end string)

EDIT: This only works if every id to check is in its own string, otherwise you'd need one of examples from other answers.

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174696

You may use possesive quantifier.

"#[1-9]\\d*+(?![\\[\\])])"

\\d*+ matches all zero or more character greedily and the + eixts after * won't let the regex engine to backtrack.

Add an optional \\W, if you want to match also the following non-word character.

"#[1-9]\\d*+(?![\\[\\])])\\W?"

DEMO

Upvotes: 2

Related Questions