user40739
user40739

Reputation: 91

Match everything but regular expression inside the string

I am trying write a regular expression that will match everying in between two specific words but will also discard all substrings of a specific pattern.

For example, if the given sentence is: 'START this is [*9-11*0] a dummy [3-*1] sentence END', I want to write a regular expression to get the answer: this is a dummy sentence

If I only want to match everything in between the words START and END, I can write regular expression: START(.*?)END

But I also want to discard all the patterns in between that starts with [ followed by any combination of numbers, hyphen and * and ending with ].

How do I do that?

Upvotes: 2

Views: 89

Answers (2)

The fourth bird
The fourth bird

Reputation: 163287

As suggested in the comments you can use a 2 step approach.

First match from START to END with no occurrences of start or END in between.

\bSTART\b((?:(?!\b(?:START|END)\b).)*)\bEND\b

See a regex demo.

Then remove the square brackets with the allowed chars using a repeated character class.

\[[0-9*-]+]

Due to the replacements, there might occur double space gaps. At then end you can replace all 2 or more spaces with a single space, and trim the string to remove leading and training whitespace chars of the whole string.

There is no language listed, but for example using Javascript:

const regex = /\bSTART\b((?:(?!\b(?:START|END)\b).)*)\bEND\b/g;
const s = "START this is [*9-11*0] a dummy [3-*1] sentence END";
Array.from(
  s.matchAll(regex), m => console.log(
    m[1].replace(/\[[0-9*-]+]/g, '').replace(/\s{2}/g, ' ').trim()
  )
);

Upvotes: 0

AnilGoyal
AnilGoyal

Reputation: 26218

This regex will serve your purpose

/START|(\[[^]]*\]\s)|END/g

A demo can be seen above regex101 link.

Explanation -

  • three alternatives
    1. First START as it is
    2. second anything between literals [ and ] except [
    3. last END literal as it is

Upvotes: 1

Related Questions