Sonnenhut
Sonnenhut

Reputation: 4453

Regex not operator

Is there an NOT operator in Regexes? Like in that string : "(2001) (asdf) (dasd1123_asd 21.01.2011 zqge)(dzqge) name (20019)"

I want to delete all \([0-9a-zA-z _\.\-:]*\) but not the one where it is a year: (2001).

So what the regex should return must be: (2001) name.

NOTE: something like \((?![\d]){4}[0-9a-zA-z _\.\-:]*\) does not work for me (the (20019) somehow also matches...)

Upvotes: 307

Views: 909033

Answers (4)

lalebarde
lalebarde

Reputation: 1816

Here is an alternative:

(\(\d{4}\))((?:\s*\([0-9a-zA-z _\.\-:]*\))*)([^()]*)(( ?\([0-9a-zA-z _\.\-:]*\))*)

Repetitive patterns are embedded in a single group with this construction, where the inner group is not a capturing one: ((:?pattern)*), which enable to have control on the group numbers of interrest.

Then you get what you want with: \1\3

Upvotes: 1

birgersp
birgersp

Reputation: 4916

You could capture the (2001) part and replace the rest with nothing.

public static string extractYearString(string input) {
    return input.replaceAll(".*\(([0-9]{4})\).*", "$1");
}

var subject = "(2001) (asdf) (dasd1123_asd 21.01.2011 zqge)(dzqge) name (20019)";
var result = extractYearString(subject);
System.out.println(result); // <-- "2001"

.*\(([0-9]{4})\).* means

  • .* match anything
  • \( match a ( character
  • ( begin capture
  • [0-9]{4} any single digit four times
  • ) end capture
  • \) match a ) character
  • .* anything (rest of string)

Upvotes: 1

Johan Sj&#246;berg
Johan Sj&#246;berg

Reputation: 49187

Not quite, although generally you can usually use some workaround on one of the forms

  • [^abc], which is character by character not a or b or c,
  • or negative lookahead: a(?!b), which is a not followed by b
  • or negative lookbehind: (?<!a)b, which is b not preceeded by a

Upvotes: 474

Joachim Sauer
Joachim Sauer

Reputation: 308021

No, there's no direct not operator. At least not the way you hope for.

You can use a zero-width negative lookahead, however:

\((?!2001)[0-9a-zA-z _\.\-:]*\)

The (?!...) part means "only match if the text following (hence: lookahead) this doesn't (hence: negative) match this. But it doesn't actually consume the characters it matches (hence: zero-width).

There are actually 4 combinations of lookarounds with 2 axes:

  • lookbehind / lookahead : specifies if the characters before or after the point are considered
  • positive / negative : specifies if the characters must match or must not match.

Upvotes: 204

Related Questions