Reputation: 91
I am trying write a regular expression that will match everying in between two specific words but will also discard all substrings of a specific pattern.
For example, if the given sentence is: 'START this is [*9-11*0] a dummy [3-*1] sentence END', I want to write a regular expression to get the answer: this is a dummy sentence
If I only want to match everything in between the words START and END, I can write regular expression: START(.*?)END
But I also want to discard all the patterns in between that starts with [ followed by any combination of numbers, hyphen and * and ending with ].
How do I do that?
Upvotes: 2
Views: 89
Reputation: 163287
As suggested in the comments you can use a 2 step approach.
First match from START to END with no occurrences of start or END in between.
\bSTART\b((?:(?!\b(?:START|END)\b).)*)\bEND\b
See a regex demo.
Then remove the square brackets with the allowed chars using a repeated character class.
\[[0-9*-]+]
Due to the replacements, there might occur double space gaps. At then end you can replace all 2 or more spaces with a single space, and trim the string to remove leading and training whitespace chars of the whole string.
There is no language listed, but for example using Javascript:
const regex = /\bSTART\b((?:(?!\b(?:START|END)\b).)*)\bEND\b/g;
const s = "START this is [*9-11*0] a dummy [3-*1] sentence END";
Array.from(
s.matchAll(regex), m => console.log(
m[1].replace(/\[[0-9*-]+]/g, '').replace(/\s{2}/g, ' ').trim()
)
);
Upvotes: 0