Юрий Светлов
Юрий Светлов

Reputation: 1750

Regex to match all of symbols but except a word

How do regex to match all of symbols but except a word?

Need find all symbols except a word.

(.*) - It find all symbols.

[^v] - It find all symbols except letter v

But do how find all symbols except a word?

Solution (writed below):

((?:(?!here any word for block)[\s\S])*?)

or

((?:(?!here any word for block).)*?)

((?:(?!video)[\s\S])*?)


I want to find all except |end| and replace all except `|end|.

I try:

Need all except |end|

var str = '|video| |end| |water| |sun| |cloud|';
// May be:
//var str = '|end| |video| |water| |sun| |cloud|';
//var str = '|cloud| |video| |water| |sun| |end|';

str.replace(/\|((?!end|end$).*?)\|/gm, test_fun2);

function test_fun2(match, p1, offset, str_full) {
  console.log("--------------");
  p1 = "["+p1+"]";
  console.log(p1);
  console.log("--------------");
  return p1;
}

Output console log:

--------------
[video]
--------------
--------------

--------------
--------------

--------------
--------------

--------------

Example what need:

Any symbols except [video](

input - '[video](text-1 *******any symbols except: "[video](" ******* [video](text-2 any symbols) [video](text-3 any symbols) [video](text-4 any symbols) [video](text-5 any symbols)'

output - <div>text-1 *******any symbols except: "[video](" *******</div> <div>text-2 any symbols</div><div>text-3 any symbols</div><div>text-4 any symbols</div><div>text-5 any symbols</div>

Upvotes: 2

Views: 1902

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626893

Scenario 1

Use the best trick ever:

One key to this technique, a key to which I'll return several times, is that we completely disregard the overall matches returned by the regex engine: that's the trash bin. Instead, we inspect the Group 1 matches, which, when set, contain what we are looking for.

Solution:

s = s.replace(/\|end\||\|([^|]*)\|/g, function ($0, $1) { 
    return $1 ? "[" + $1 + "]" : $0; 
});

Details

  • \|end\| - |end| is matched
  • | - or
  • \|([^|]*)\| - | is matched, any 0+ chars other than | are captured into Group 1, and then | is matched.

If Group 1 matched ($1 ?) the replacement occurs, else, $0, the whole match, is returned back to the result.

JS test:

console.log(
   "|video| |end| |water| |sun| |cloud|".replace(/\|end\||\|([^|]*)\|/g, function ($0, $1) { 
        return $1 ? "[" + $1 + "]" : $0; 
    })
)

Scenario 2

Use

.replace(/\[(?!end])[^\]]*]\(((?:(?!\[video]\()[\s\S])*?)\)/g, '<div>$1</div>')

See the regex demo

Details

  • \[ - a [ char
  • (?!end]) - no end] allowed right after the current position
  • [^\]]* - 0+ chars other than ] and [
  • ] - a ] char
  • \( - a ( char
  • ((?:(?!\[video])[\s\S])*?) - Group 1 that captures any char ([\s\S]), 0 or more occurrences, but as few as possible (*?) that does not start a [video]( char sequence
  • \) - a ) char.

Upvotes: 4

VLAZ
VLAZ

Reputation: 28996

You are on the right track. Here is what you need to do with regex:

var str = '|video| |end| |water| |sun| |cloud|';

console.log(str.replace(/(?!\|end\|)\|(\S*?)\|/gm, test_fun2));

function test_fun2(match, p1, offset, str_full) {
  return "["+p1+"]";
}

And an explanation of what was wrong - you had your negative-lookahead placed after the | character. That means that the matching engine would do the following:

  1. Match |video| because the pattern works with it
  2. Grab the next |
  3. Find that the next text is end which is in the negative lookahead and drop it.
  4. Grab the | immediately after end
  5. grab the space and the next | character, since this passes the negative lookahead and also works with .*?
  6. continue grabbing the intermediate | | sequences because the | in the beginning of the word was consumed by the previous match.

So you end up matching the following things

var str = '|video| |end| |water| |sun| |cloud|';
           ^^^^^^^     ^^^     ^^^   ^^^
|video| ______|         |       |     |
| | ____________________|       |     |
| | ____________________________|     |
| | __________________________________|

All because the |end match was dropped.

You can see this if you print out the matches

var str = '|video| |end| |water| |sun| |cloud|';

str.replace(/\|((?!end|end$).*?)\|/gm, test_fun2);

function test_fun2(match, p1, offset, str_full) {
  console.log(match, p1, offset);
}

You will see that the second, third, and fourth match is | | the captured item p1 is - a blank space (not very well displayed, but there) and the offset they were found were 12, 20, 26

|video| |end| |water| |sun| |cloud|
01234567890123456789012345678901234
            ^       ^     ^
12 _________|       |     |
20 _________________|     |
26 _______________________|

The change I made was to instead look for explicitly the |end| pattern in a negative lookahead and also to only match non-whitespace characters, so you don't grab | | again.

Also worth noting that you can move your filtering logic to the replacement callback instead, instead of the regex. This simplifies the regex but makes your replacement more complex. Still, it's a fair tradeoff, as code is usually easier to maintain if you have more complex conditions:

var str = '|video| |end| |water| |sun| |cloud|';

//capturing word characters - an alternative to "non-whitespace"
console.log(str.replace(/\|(\w*)\|/gm, test_fun2)); 

function test_fun2(match, p1, offset, str_full) {
  if (p1 === 'end') {
    return match;
  } else {
    return "[" + p1 + "]"
  }
}

Upvotes: 2

Niet the Dark Absol
Niet the Dark Absol

Reputation: 324650

Something like this is better done in multiple steps. Also, if you're matching stuff, you should use match.

var str = '|video| |end| |water| |sun| |cloud|';
var matches = str.match(/\|.*?\|/g);

// strip pipe characters...
matches = matches.map(m=>m.slice(1,-1));

// filter out unwanted words
matches = matches.filter(m=>!['end'].includes(m));
           // this allows you to add more filter words easily
           // if you'll only ever need "end", just do (m=>m!='end')

console.log(matches); // ["video","water","sun","cloud"]

Notice how this is a lot easier to understand what's going on, and also much easier to maintain and change in future as needed.

Upvotes: 2

Related Questions