Reputation: 2897
I am attempting to write a regular expression to match certain patterns except for those with a preceding pattern. In other words given the following sentence:
Don't want to match paragraph 1.2.3.4 but this instead 5.6.7.8
I would like to match all X.X.X.X
that does not have the word paragraph
in front of it, ie it should only match 5.6.7.8. My current regex as such seems to match both 1.2.3.4 and 5.6.7.8. I have switched around the lookaheads but doesn't seem to match my use case.
(?<!paragraph)(?:[\(\)0-9a-zA-Z]+\.)+[\(\)0-9a-zA-Z]+
I code in javascript.
EDIT: Note that X.X.X.X
are not fixed at 4 X
s. They range from X.X
to X.X.X.X.X
Upvotes: 3
Views: 233
Reputation: 1074018
Your pattern matches because "paragraph" is not the same as "paragraph[space]". Your pattern doesn't have a space. Your text does.
You may want to add the space (perhaps conditionally?) to your lookbehind. Because you want to match a varying number of X.X.X.X
(you've said X.X
through X.X.X.X.X
), we need to include X.
in the lookbehind as well:
const rex = /(?<!paragraph *(?:[\(\)0-9a-zA-Z]+\.)*)(?:[\(\)0-9a-zA-Z]+\.){1,4}[\(\)0-9a-zA-Z]/i;
Live Example:
function test(str) {
const rex = /(?<!paragraph *(?:[\(\)0-9a-zA-Z]+\.)*)(?:[\(\)0-9a-zA-Z]+\.){1,4}[\(\)0-9a-zA-Z]/i;
const match = rex.exec(str);
console.log(match ? match[0] : "No match");
}
console.log("Testing four 'digits':");
test("Don't want to match paragraph 1.2.3.4 but this instead 5.6.7.8 blah");
console.log("Testing two 'digits':");
test("Don't want to match paragraph 1.2.3.4 but this instead 5.6 blah");
console.log("Testing two 'digits' again:");
test("Don't want to match paragraph 1.2 but this instead 5.6 blah");
console.log("Testing five 'digits' again:");
test("Don't want to match paragraph 1.2 but this instead 5.6.7.8.9 blah");
That expression requires:
paragraph
followed by zero or more spaces possibly followed by X.
zer or more times is not immediately prior to the match; andX.
is repeated one to four times ({1,4}
); andX
immediately follows those threeX
in my example is A-Z0-9
and I've made the expression case-insensitive, but you can tweak as needed.
Note that lookbehind was only added to JavaScript recently, in ES2018, so support requires up-to-date JavaScript environments. If you need lookbehind on older environments, you might check out Steven Levithan's excellent XRegex library.
Also note that variable-length lookbehind like the above is not supported in all languages (but is supported in JavaScript...in engines that are up-to-date).
Upvotes: 4
Reputation: 5853
You can build the Regex iteratively -
Test regex here.
const inputData = 'Don\'t want to match paragraph 1.2.3.4 but this instead 5.6.7.8 and 12.2.333.2';
const re = /(?<!paragraph\s+)(\d{1,}\.\d{1,}\.\d{1,}\.\d{1,})/ig;
const matchedGroups = inputData.matchAll(re);
for (const matchedGroup of matchedGroups) {
console.log(matchedGroup);
}
Upvotes: 0
Reputation: 2835
If you always want to match a 4-item group, you can do it like this:
(?<!paragraph )([0-9]+.?){4}
Upvotes: 0