Reputation: 582
I'm trying to write a regular expression for Java that matches if there is a semicolon that does not have two (or more) leading '-' characters.
I'm only able to get the opposite working: A semicolon that has at least two leading '-' characters.
([\-]{2,}.*?;.*)
But I need something like
([^([\-]{2,})])*?;.*
I'm somehow not able to express 'not at least two - characters'.
Here are some examples I need to evaluate with the expression:
; -- a : should match
-- a ; : should not match
-- ; : should not match
--; : should not match
-;- : should match
---; : should not match
-- semicolon ; : should not match
bla ; bla : should match
bla : should not match (; is mandatory)
-;--; : should match (the first occuring semicolon must not have two or more consecutive leading '-')
Upvotes: 5
Views: 1752
Reputation: 124275
It seems that this regex matches what you want
String regex = "[^-]*(-[^-]+)*-?;.*";
Explanation: matches
will accept string that:
[^-]*
can start with non dash characters (-[^-]+)*-?;
is a bit tricky because before we will match ;
we need to make sure that each -
do not have another -
after it so:
(-[^-]+)*
each -
have at least one non -
character after it-?
or -
was placed right before ;
;.*
if earlier conditions ware fulfilled we can accept ;
and any .*
characters after it. More readable version, but probably little slower
((?!--)[^;])*;.*
Explanation:
To make sure that there is ;
in string we can use .*;.*
in matches.
But we need to add some conditions to characters before first ;
.
So to make sure that matched ;
will be first one we can write such regex as
[^;]*;.*
which means:
[^;]*
zero or more non semicolon characters;
first semicolon.*
zero or more of any characters (actually .
can't match line separators like \n
or \r
)So now all we need to do is make sure that character matched by [^;]
is not part of --
. To do so we can use look-around mechanisms for instance:
(?!--)[^;]
before matching [^;]
(?!--)
checks that next two characters are not --
, in other words character matched by [^;]
can't be first -
in series of two --
[^;](?<!--)
checks if after matching [^;]
regex engine will not be able to find --
if it will backtrack two positions, in other words [^;]
can't be last character in series of --
.Upvotes: 2
Reputation: 3625
You need a negative lookahead!
This regex will match any string which does not contain your original match pattern:
(?!-{2,}.*?;.*).*?;.*
This Regex matches a string which contains a semicolon, but not one occuring after 2 or more dashes.
Example:
Upvotes: 0
Reputation: 75242
I think this is what you're looking for:
^(?:(?!--).)*;.*$
In other words, match from the start of the string (^
), zero or more characters (.*
) followed by a semicolon. But replacing the dot with (?:(?!--).)
causes it to match any character unless it's the beginning of a two-hyphen sequence (--
).
If performance is an issue, you can exclude the semicolon as well, so it never has to backtrack:
^(?:(?!--|;).)*;.*$
EDIT: I just noticed your comment that the regex should work with the matches()
method, so I padded it out with .*
. The anchors aren't really necessary, but they do no harm.
Upvotes: 0
Reputation: 785601
How about using this regex in Java:
[^;]*;(?<!--[^;]{0,999};).*
Only caveat is that it works with up to 999
character length between --
and ;
Upvotes: 0
Reputation: 70929
How about just splitting the string along --
and if there are two or more sub strings, checking if the last one contains a semicolon?
Upvotes: 0