Richard
Richard

Reputation: 582

Match first occurrence of semicolon in string, only if not preceded by '--'

I'm trying to write a regular expression for Java that matches if there is a semicolon that does not have two (or more) leading '-' characters.

I'm only able to get the opposite working: A semicolon that has at least two leading '-' characters.

([\-]{2,}.*?;.*)

But I need something like

([^([\-]{2,})])*?;.*

I'm somehow not able to express 'not at least two - characters'.

Here are some examples I need to evaluate with the expression:

; -- a           : should match
-- a ;           : should not match
-- ;             : should not match
--;              : should not match
-;-              : should match
---;             : should not match
-- semicolon ;   : should not match
bla ; bla        : should match
bla              : should not match (; is mandatory)
-;--;            : should match (the first occuring semicolon must not have two or more consecutive leading '-')

Upvotes: 5

Views: 1752

Answers (5)

Pshemo
Pshemo

Reputation: 124275

It seems that this regex matches what you want

String regex = "[^-]*(-[^-]+)*-?;.*";

DEMO

Explanation: matches will accept string that:

  • [^-]* can start with non dash characters
  • (-[^-]+)*-?; is a bit tricky because before we will match ; we need to make sure that each - do not have another - after it so:
    • (-[^-]+)* each - have at least one non - character after it
    • -? or - was placed right before ;
  • ;.* if earlier conditions ware fulfilled we can accept ; and any .* characters after it.

More readable version, but probably little slower

((?!--)[^;])*;.*

Explanation:

To make sure that there is ; in string we can use .*;.* in matches.
But we need to add some conditions to characters before first ;.

So to make sure that matched ; will be first one we can write such regex as

[^;]*;.*

which means:

  • [^;]* zero or more non semicolon characters
  • ; first semicolon
  • .* zero or more of any characters (actually . can't match line separators like \n or \r)

So now all we need to do is make sure that character matched by [^;] is not part of --. To do so we can use look-around mechanisms for instance:

  • (?!--)[^;] before matching [^;] (?!--) checks that next two characters are not --, in other words character matched by [^;] can't be first - in series of two --
  • [^;](?<!--) checks if after matching [^;] regex engine will not be able to find -- if it will backtrack two positions, in other words [^;] can't be last character in series of --.

Upvotes: 2

Adam Yost
Adam Yost

Reputation: 3625

You need a negative lookahead!

This regex will match any string which does not contain your original match pattern:

(?!-{2,}.*?;.*).*?;.*

This Regex matches a string which contains a semicolon, but not one occuring after 2 or more dashes.

Example: Regex Working

Upvotes: 0

Alan Moore
Alan Moore

Reputation: 75242

I think this is what you're looking for:

^(?:(?!--).)*;.*$

In other words, match from the start of the string (^), zero or more characters (.*) followed by a semicolon. But replacing the dot with (?:(?!--).) causes it to match any character unless it's the beginning of a two-hyphen sequence (--).

If performance is an issue, you can exclude the semicolon as well, so it never has to backtrack:

^(?:(?!--|;).)*;.*$

EDIT: I just noticed your comment that the regex should work with the matches() method, so I padded it out with .*. The anchors aren't really necessary, but they do no harm.

Upvotes: 0

anubhava
anubhava

Reputation: 785601

How about using this regex in Java:

[^;]*;(?<!--[^;]{0,999};).*

Only caveat is that it works with up to 999 character length between -- and ;

Java Regex Demo

Upvotes: 0

Edwin Buck
Edwin Buck

Reputation: 70929

How about just splitting the string along -- and if there are two or more sub strings, checking if the last one contains a semicolon?

Upvotes: 0

Related Questions