Reputation: 1750
How do regex to match all of symbols but except a word?
Need find all symbols except a word.
(.*)
- It find all symbols.
[^v]
- It find all symbols except letter v
But do how find all symbols except a word?
Solution (writed below):
((?:(?!here any word for block)[\s\S])*?)
or
((?:(?!here any word for block).)*?)
((?:(?!video)[\s\S])*?)
I want to find all except |end|
and replace all except `|end|.
I try:
Need all except |end|
var str = '|video| |end| |water| |sun| |cloud|';
// May be:
//var str = '|end| |video| |water| |sun| |cloud|';
//var str = '|cloud| |video| |water| |sun| |end|';
str.replace(/\|((?!end|end$).*?)\|/gm, test_fun2);
function test_fun2(match, p1, offset, str_full) {
console.log("--------------");
p1 = "["+p1+"]";
console.log(p1);
console.log("--------------");
return p1;
}
Output console log:
--------------
[video]
--------------
--------------
--------------
--------------
--------------
--------------
--------------
Example what need:
Any symbols except [video](
input - '[video](text-1 *******any symbols except: "[video](" ******* [video](text-2 any symbols) [video](text-3 any symbols) [video](text-4 any symbols) [video](text-5 any symbols)'
output - <div>text-1 *******any symbols except: "[video](" *******</div> <div>text-2 any symbols</div><div>text-3 any symbols</div><div>text-4 any symbols</div><div>text-5 any symbols</div>
Upvotes: 2
Views: 1902
Reputation: 626893
Use the best trick ever:
One key to this technique, a key to which I'll return several times, is that we completely disregard the overall matches returned by the regex engine: that's the trash bin. Instead, we inspect the Group 1 matches, which, when set, contain what we are looking for.
Solution:
s = s.replace(/\|end\||\|([^|]*)\|/g, function ($0, $1) {
return $1 ? "[" + $1 + "]" : $0;
});
Details
\|end\|
- |end|
is matched|
- or\|([^|]*)\|
- |
is matched, any 0+ chars other than |
are captured into Group 1, and then |
is matched.If Group 1 matched ($1 ?
) the replacement occurs, else, $0
, the whole match, is returned back to the result.
JS test:
console.log(
"|video| |end| |water| |sun| |cloud|".replace(/\|end\||\|([^|]*)\|/g, function ($0, $1) {
return $1 ? "[" + $1 + "]" : $0;
})
)
Use
.replace(/\[(?!end])[^\]]*]\(((?:(?!\[video]\()[\s\S])*?)\)/g, '<div>$1</div>')
See the regex demo
Details
\[
- a [
char(?!end])
- no end]
allowed right after the current position[^\]]*
- 0+ chars other than ]
and [
]
- a ]
char\(
- a (
char((?:(?!\[video])[\s\S])*?)
- Group 1 that captures any char ([\s\S]
), 0 or more occurrences, but as few as possible (*?
) that does not start a [video](
char sequence\)
- a )
char.Upvotes: 4
Reputation: 28996
You are on the right track. Here is what you need to do with regex:
var str = '|video| |end| |water| |sun| |cloud|';
console.log(str.replace(/(?!\|end\|)\|(\S*?)\|/gm, test_fun2));
function test_fun2(match, p1, offset, str_full) {
return "["+p1+"]";
}
And an explanation of what was wrong - you had your negative-lookahead placed after the |
character. That means that the matching engine would do the following:
|video|
because the pattern works with it|
end
which is in the negative lookahead and drop it.|
immediately after end
|
character, since this passes the negative lookahead and also works with .*?
| |
sequences because the |
in the beginning of the word was consumed by the previous match.So you end up matching the following things
var str = '|video| |end| |water| |sun| |cloud|';
^^^^^^^ ^^^ ^^^ ^^^
|video| ______| | | |
| | ____________________| | |
| | ____________________________| |
| | __________________________________|
All because the |end
match was dropped.
You can see this if you print out the matches
var str = '|video| |end| |water| |sun| |cloud|';
str.replace(/\|((?!end|end$).*?)\|/gm, test_fun2);
function test_fun2(match, p1, offset, str_full) {
console.log(match, p1, offset);
}
You will see that the second, third, and fourth match
is | |
the captured item p1
is - a blank space (not very well displayed, but there) and the offset they were found were
12
, 20
, 26
|video| |end| |water| |sun| |cloud|
01234567890123456789012345678901234
^ ^ ^
12 _________| | |
20 _________________| |
26 _______________________|
The change I made was to instead look for explicitly the |end|
pattern in a negative lookahead and also to only match non-whitespace characters, so you don't grab | |
again.
Also worth noting that you can move your filtering logic to the replacement callback instead, instead of the regex. This simplifies the regex but makes your replacement more complex. Still, it's a fair tradeoff, as code is usually easier to maintain if you have more complex conditions:
var str = '|video| |end| |water| |sun| |cloud|';
//capturing word characters - an alternative to "non-whitespace"
console.log(str.replace(/\|(\w*)\|/gm, test_fun2));
function test_fun2(match, p1, offset, str_full) {
if (p1 === 'end') {
return match;
} else {
return "[" + p1 + "]"
}
}
Upvotes: 2
Reputation: 324650
Something like this is better done in multiple steps. Also, if you're matching stuff, you should use match
.
var str = '|video| |end| |water| |sun| |cloud|';
var matches = str.match(/\|.*?\|/g);
// strip pipe characters...
matches = matches.map(m=>m.slice(1,-1));
// filter out unwanted words
matches = matches.filter(m=>!['end'].includes(m));
// this allows you to add more filter words easily
// if you'll only ever need "end", just do (m=>m!='end')
console.log(matches); // ["video","water","sun","cloud"]
Notice how this is a lot easier to understand what's going on, and also much easier to maintain and change in future as needed.
Upvotes: 2