user3186555
user3186555

Reputation:

not-group in regex

So I understand that [^A-Za-z] would match any character that's not a letter.

Is there any way to do this with a group? For example: (?^:&) - would match any sequence of characters that is not the sequence &

NOTE: as Mark Reed pointed out, it would be pointless to match an empty string, as an empty string is a sequence of characters that is not the sequence, so I would like the regex to match as many characters as possible

FOR EXAMPLE:

in Ben & Jerry's the matches would be Ben and Jerry's (note that the whitespaces after Ben and before Jerry's are captured too.

NOTE: if possible, please do not use look behinds, because I will be using the regex in a JS script, and Javascript does not support look behinds.

Upvotes: 5

Views: 8052

Answers (3)

Greg Bacon
Greg Bacon

Reputation: 139471

See Randal’s Rule.

Randal's Rule

Randal Schwartz (author of Learning Perl) says:

Use capturing when you know what you want to keep.

Use split when you know what you want to throw away.

var s = "Ben & Jerry's";
var a = s.split(/&/);
document.body.innerHTML = "<pre>[" + a.join("][") + "]</pre>";

To show how much work (?!...) for negative look-ahead saves us, the equivalent regex to match a string that does not contain the sequence &amp; is

^([^&]|&+[^&a]|(&+a)+([^&m]|&+[^&a])|(&+a)+m((&+a)+m)*([^&p]|&+[^&a]|(&+a)+([^&m]|&+[^&a]))|(&+a)+m((&+a)+m)*p((&+a)+m((&+a)+m)*p)*([^&;]|&+[^&a]|(&+a)+([^&m]|&+[^&a])|(&+a)+m((&+a)+m)*([^&p]|&+[^&a]|(&+a)+([^&m]|&+[^&a]))))*(&+|(&+a)+(&+)?|(&+a)+m((&+a)+m)*(&+|(&+a)+(&+)?)?|(&+a)+m((&+a)+m)*p((&+a)+m((&+a)+m)*p)*(&+|(&+a)+(&+)?|(&+a)+m((&+a)+m)*(&+|(&+a)+(&+)?)?)?)?$

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626826

What you need is a regex that will match alternatives, and will only capture into Group 1 the last alternative that will present a tempered greedy token (or an unrolled version for better performance - if you only have 2 or 3):

&amp;|((?:(?!&amp;)[\s\S])+)

See the regex demo (an unrolled version - &amp;|([^&]*(?:&(?!amp;)[^&]*)*)

The pattern:

  • &amp; - matches & entity
  • | - or
  • ((?:(?!&amp;)[\s\S])+) - matches and captures into group 1 any chunk of text (1+ characters) that is not a starting point for a &amp; sequence. Since it is for JS, you need a [\s\S] (or [^]) to match any character including a newline. Otherwise, use . instead (if you only intend to match lines).

var re = /&amp;|((?:(?!&amp;)[\s\S])+)/g; 
var str = 'abc Ben &amp; Jerry\'s    foobar ssss  sss  sss &amp;\n\n\nsssss&amp;sssss     &amp;\n\nsssss&amp;sssss     &amp;sssss\n&amp;sssss&amp;\n&amp;&amp;';
var res = [];
 
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {// A part of code only necessary for the 
        re.lastIndex++;            // unrolled pattern (as it can match empty string)
    }
    res.push(m[1]);                // Only collect the captured texts
}
document.body.innerHTML = "<pre>BEFORE:<br/>" + str.replace(/&/g, '&amp;') + "</pre>";
document.body.innerHTML += "<pre>AFTER:<br/>" + res.join("") + "</pre>";

Upvotes: 4

Aminah Nuraini
Aminah Nuraini

Reputation: 19156

Easy:

(.*?)(?:&amp;)|((?!&amp;).*)$

Demo

Explanation:

  1. (.*?): Take everything but non greedy.
  2. (?:&amp;): ?: is non-capturing group. A group that you don't want to get the value.
  3. ((?!&amp;).*)$: get the rest of the string which is not &amp;

Upvotes: 3

Related Questions