Rajat
Rajat

Reputation: 5803

Use a non-capturing group only in the presence of certain substrings

Consider this URL for example:

domain.com/search.action?zip=94558&year_max=2018

I am trying to build a regex that captures domain if URL satisfies either of these 2 conditions

  1. The URL doesn't have a year_max parameter.
  2. The URL does have a year_max parameter and it takes one of these values (2019,2020,2021,2022) for year.

Update: The urls can have multiple & and could have other parameters after year_max. The only certainty is that year_max paramter is the last one that needs matching (if it exists) and all other parameters would have their capture groups defined before it.

Here is my attempt so far:

(domain\.com)\/.*(?:&year_max=2019|2020|2021|2022)?

How do I modify it so that if I have a set of URLs like below, it only matches on the first and the 3rd URL?

domain.com/search.action?zip=94558&year_max=2020
domain.com/search.action?zip=94558&year_max=2017
domain.com/search.action?zip=94558

Regex 101 Fiddle

Upvotes: 1

Views: 35

Answers (3)

logi-kal
logi-kal

Reputation: 7880

The domain must not be followed by a year_max that is not followed by 2019, 2020, 2021 or 2022.

The most intuitive translation of this double negation is the use of two negative lookaheads:

(domain\.com)\/(?!.*&year_max=(?!20(?:19|2[0-2])))

See demo: https://regex101.com/r/D88X4n/1

Notice that lookaheads are zero-length assertions (they don't consume characters), so you could even use them along with standard capturing groups for matching other parameters.

Upvotes: 2

The fourth bird
The fourth bird

Reputation: 163632

For the example urls with a single occurrence of &, you can use:

^(domain\.com)/[^\s&]+(?:&year_max=(?:2019|202[012]))?$
  • ^ Start of string
  • (domain\.com) Capture domain\.com in group 1
  • /[^\s&]+ Match / and 1+ occurrences of any char except & or a whitepace chars
  • (?: Non capture group
    • &year_max=(?:2019|202[012]) Match &year_max= and either 2019 2020 2021 2022
  • )? Close the non capture group and make it optional
  • $ End of string

Regex demo

As you have selected Python in the regex101 tool, you don't have to escape the /

Upvotes: 1

trincot
trincot

Reputation: 351288

You could use a negative look-ahead for the case where the URL-query-parameter is not included:

^(domain\.com)\/(?:.*&year_max=(?:2019|2020|2021|2022)|(?!.*&year_max=)).*$

Upvotes: 1

Related Questions