Reputation: 182
I need help with some tricky regex to solve (for me!) and hope I can learn something to write some myself in the future.
I need to match all of the following IDs:
#1
#12
#123
#1234
#5069
#316&
#316.
#316;
and do not want to match leading zeros and numbers that end with ]
or [
or are between ()
.
#0155
#0000155
#1123]
#1123[
(#1125)
I have come up with something like this: (#[1-9]\d{0,})
, but it matches all of the above. So, I tried a different stuff like:
"(#[1-9]\\d{0,})([\\s,<\\.:&;\\)])"
"(#[1-9]+)([\\s,<\\.])"
"(?m)(#[1-9]+)(.,\(,\))"
But what I really want to do is (#[1-9]\d{0,})
to match all numbers BUT NOT FOLLOWING [
OR ]
OR (
OR )
.
How can I express something like this in a regex?
P.S.: The Regex needs to be used in Java.
Maybe someone can help to solve this, even better if he can explain how he got the way to the solution, so i can learn something new and help others when they struggle with the same problem.
kind regards!
Upvotes: 2
Views: 93
Reputation: 626691
You can use the following solution:
#[1-9]\d*(?![\[\])])\b[&.;]?
See demo
REGEX:
#
- Matches #
literally[1-9]
- 1 digit from 1 to 9\d*
- 0 or more digits(?![\[\])])
- A negative lookahead making sure there is no [
, ]
or )
after the digits\b
- A word boundary[&.;]?
- An optional (?
) character group matching &
, .
or ;
literally.Sample code:
String str = "#1\n#12\n#123\n#1234\n#5069\n#316&\n#316.\n#316;\nand not matches (leading zeros) and numbers that end with ] or [ or are between ().\n\n#0155\n#0000155\n#1123]\n#1123[\n(#1125)";
String rx = "#[1-9]\\d*(?![\\[\\])])\\b[&.;]?";
Pattern ptrn = Pattern.compile(rx);
Matcher m = ptrn.matcher(str);
while (m.find()) {
System.out.println(m.group(0));
}
See IDEONE demo
UPDATE
You can achieve the expected results with atomic grouping that prevents the regex engine from backtracking into it.
String rx = "#(?>[1-9]\\d*)(?![\\[\\])])[^\\w&&[^\n]]?";
In plain words, the check for brackets will be performed only after the last digit matched. See updated demo.
The [^\\w&&[^\n]]?
pattern optionally matches any non-alphanumeric character but a newline. The newline is excluded from the character class using a character class intersection technique.
Upvotes: 3
Reputation: 1984
I am not able to test this in Java at the moment, but how about
"^#[1-9][0-9]*[&.;]?$"
(Any string starting with a '#', then a character from 1-9, then zero or more characters from 0-9, then a '&', '.' or ';' or nothing, end string)
EDIT: This only works if every id to check is in its own string, otherwise you'd need one of examples from other answers.
Upvotes: 0
Reputation: 174696
You may use possesive quantifier.
"#[1-9]\\d*+(?![\\[\\])])"
\\d*+
matches all zero or more character greedily and the +
eixts after *
won't let the regex engine to backtrack.
Add an optional \\W
, if you want to match also the following non-word character.
"#[1-9]\\d*+(?![\\[\\])])\\W?"
Upvotes: 2