Reputation: 1064
I'm trying to match words that consist only of characters in this character class: [A-z'\\/%]
, excluding cases where:
<
and >
[
and ]
{
and }
So, say I've got this funny string:
[beginning]<start>How's {the} /weather (\\today%?)[end]
I need to match the following strings:
[ "How's", "/weather", "\\today%" ]
I've tried using this pattern:
/[A-z'/\\%]*(?![^{]*})(?![^\[]*\])(?![^<]*>)/gm
But for some reason, it matches:
[ "[beginning]", "", "How's", "", "", "", "/weather", "", "", "\\today%", "", "", "[end]", "" ]
I'm not sure why my pattern allows stuff between [
and ]
, since I used (?![^\[]*\])
, and a similar approach seems to work for not matching {these cases}
and <these cases>
. I'm also not sure why it matches all the empty strings.
Any wisdom? :)
Upvotes: 8
Views: 314
Reputation: 163207
You can match all the cases that you don't want using an alternation and place the character class in a capturing group to capture what you want to keep.
The [^
is a negated character class that matches any character except what is specified.
(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)
Explanation
(?:
Non capture group
\[[^\][]*]
Match from opening till closing []
|
Or<[^<>]*>
Match from opening till closing <>
|
Or{[^{}]*}
Match from opening till closing {}
)
Close non capture group|
Or([A-Za-z'/\\%]+)
Repeat the character class 1+ times to prevent empty matches and capture in group 1const regex = /(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)/g;
const str = `[beginning]<start>How's {the} /weather (\\\\today%?)[end]`;
let m;
while ((m = regex.exec(str)) !== null) {
if (m[1] !== undefined) console.log(m[1]);
}
Upvotes: 1
Reputation: 3409
Split it with regular expression:
let data = "[beginning]<start>How's {the} /weather (\\today%?)[end]";
let matches = data.split(/\s*(?:<[^>]+>|\[[^\]]+\]|\{[^\}]+\}|[()])\s*/);
console.log(matches.filter(v => "" !== v));
Upvotes: 1
Reputation: 19641
There are essentially two problems with your pattern:
Never use A-z
in a character class if you intend to match only letters (because it will match more than just letters1). Instead, use a-zA-Z
(or A-Za-z
).
Using the *
quantifier after the character class will allow empty matches. Use the +
quantifier instead.
So, the fixed pattern should be:
[A-Za-z'/\\%]+(?![^{]*})(?![^\[]*\])(?![^<]*>)
Demo.
1 The [A-z]
character class means "match any character with an ASCII code between 65 and 122". The problem with that is that codes between 91 and 95 are not letters (and that's why the original pattern matches characters like '[' and ']').
Upvotes: 4