merlin
merlin

Reputation: 2927

How to exclude words from match in regex?

I am trying to exclude certain URLs from a match, containing /com/de/cms/ e.g.:

match this:

www.example.com/catname/all-from-category/?pageNumber=1

but not this:

example.com/com/de/cms/catname/all-from-category/?pageNumber=3

Regex:

^[^com\/de\/cms\/]+\/all-from-category\/\?pageNumber=\d(&hitsPerPage=\d)?

https://regex101.com/r/Mqpspq/1

How can I exclude URLs with com/de/cms/ while matching the other URL?

Upvotes: 0

Views: 853

Answers (1)

Youssef13
Youssef13

Reputation: 4986

There are couple of mistakes in your regex.

  1. The first ^ matches the start of the starting, or the start of a line if multiline mode is enabled.

  2. The [^com\/de\/cms] part means to match any character except c, or o, or m or /, or, etc. But your intent was to match any substring except com/de/cms as a whole. What you want can be done using negative lookbehind, like this: (?<!com\/de\/cms\/)

  3. You're missing the catname part.

A working regex would be:

(?<!com\/de\/cms)\/catname\/all-from-category\/\?pageNumber=\d

The previous regex is simply says the following:

Please, match /catname/all-from-category/?pageNumber=SOME_DIGIT that is not preceded by com/de/cms.

Regexr.

Upvotes: 1

Related Questions