Koh
Koh

Reputation: 2897

Regex: Match patterns except with pattern preceding

I am attempting to write a regular expression to match certain patterns except for those with a preceding pattern. In other words given the following sentence:

Don't want to match paragraph 1.2.3.4 but this instead 5.6.7.8

I would like to match all X.X.X.X that does not have the word paragraph in front of it, ie it should only match 5.6.7.8. My current regex as such seems to match both 1.2.3.4 and 5.6.7.8. I have switched around the lookaheads but doesn't seem to match my use case.

(?<!paragraph)(?:[\(\)0-9a-zA-Z]+\.)+[\(\)0-9a-zA-Z]+

I code in javascript.

EDIT: Note that X.X.X.X are not fixed at 4 Xs. They range from X.X to X.X.X.X.X

Upvotes: 3

Views: 233

Answers (3)

T.J. Crowder
T.J. Crowder

Reputation: 1074018

Your pattern matches because "paragraph" is not the same as "paragraph[space]". Your pattern doesn't have a space. Your text does.

You may want to add the space (perhaps conditionally?) to your lookbehind. Because you want to match a varying number of X.X.X.X (you've said X.X through X.X.X.X.X), we need to include X. in the lookbehind as well:

const rex = /(?<!paragraph *(?:[\(\)0-9a-zA-Z]+\.)*)(?:[\(\)0-9a-zA-Z]+\.){1,4}[\(\)0-9a-zA-Z]/i;

Live Example:

function test(str) {
    const rex = /(?<!paragraph *(?:[\(\)0-9a-zA-Z]+\.)*)(?:[\(\)0-9a-zA-Z]+\.){1,4}[\(\)0-9a-zA-Z]/i;
    const match = rex.exec(str);
    console.log(match ? match[0] : "No match");
}

console.log("Testing four 'digits':");
test("Don't want to match paragraph 1.2.3.4 but this instead 5.6.7.8 blah");

console.log("Testing two 'digits':");
test("Don't want to match paragraph 1.2.3.4 but this instead 5.6 blah");

console.log("Testing two 'digits' again:");
test("Don't want to match paragraph 1.2 but this instead 5.6 blah");

console.log("Testing five 'digits' again:");
test("Don't want to match paragraph 1.2 but this instead 5.6.7.8.9 blah");

That expression requires:

  • That paragraph followed by zero or more spaces possibly followed by X. zer or more times is not immediately prior to the match; and
  • That X. is repeated one to four times ({1,4}); and
  • That X immediately follows those three

X in my example is A-Z0-9 and I've made the expression case-insensitive, but you can tweak as needed.


Note that lookbehind was only added to JavaScript recently, in ES2018, so support requires up-to-date JavaScript environments. If you need lookbehind on older environments, you might check out Steven Levithan's excellent XRegex library.

Also note that variable-length lookbehind like the above is not supported in all languages (but is supported in JavaScript...in engines that are up-to-date).

Upvotes: 4

Kunal Mukherjee
Kunal Mukherjee

Reputation: 5853

You can build the Regex iteratively -

  1. Ignore any word with preceding with the word 'paragraph' and a white-space.
  2. Since your pattern is fixed which will consists of a quadruple of numbers seperated by a period its safe to assume that the minimum number of digits in that quadruple will be 1.
  3. Capture the quadruple of numbers in a capturing group to be used later.

Test regex here.

const inputData = 'Don\'t want to match paragraph 1.2.3.4 but this instead 5.6.7.8 and 12.2.333.2';
const re = /(?<!paragraph\s+)(\d{1,}\.\d{1,}\.\d{1,}\.\d{1,})/ig;

const matchedGroups = inputData.matchAll(re);

for (const matchedGroup of matchedGroups) {
	console.log(matchedGroup);
}

Upvotes: 0

nsevens
nsevens

Reputation: 2835

If you always want to match a 4-item group, you can do it like this:

(?<!paragraph )([0-9]+.?){4}

Upvotes: 0

Related Questions