Reputation: 5803
Consider this URL for example:
domain.com/search.action?zip=94558&year_max=2018
I am trying to build a regex that captures domain if URL satisfies either of these 2 conditions
year_max
parameter.year_max
parameter and it takes one of these
values (2019,2020,2021,2022) for year.Update: The urls can have multiple &
and could have other parameters after year_max
. The only certainty is that year_max
paramter is the last one that needs matching (if it exists) and all other parameters would have their capture groups defined before it.
Here is my attempt so far:
(domain\.com)\/.*(?:&year_max=2019|2020|2021|2022)?
How do I modify it so that if I have a set of URLs like below, it only matches on the first and the 3rd URL?
domain.com/search.action?zip=94558&year_max=2020
domain.com/search.action?zip=94558&year_max=2017
domain.com/search.action?zip=94558
Upvotes: 1
Views: 35
Reputation: 7880
The domain must not be followed by a year_max
that is not followed by 2019, 2020, 2021 or 2022.
The most intuitive translation of this double negation is the use of two negative lookaheads:
(domain\.com)\/(?!.*&year_max=(?!20(?:19|2[0-2])))
See demo: https://regex101.com/r/D88X4n/1
Notice that lookaheads are zero-length assertions (they don't consume characters), so you could even use them along with standard capturing groups for matching other parameters.
Upvotes: 2
Reputation: 163632
For the example urls with a single occurrence of &
, you can use:
^(domain\.com)/[^\s&]+(?:&year_max=(?:2019|202[012]))?$
^
Start of string(domain\.com)
Capture domain\.com
in group 1/[^\s&]+
Match /
and 1+ occurrences of any char except &
or a whitepace chars(?:
Non capture group
&year_max=(?:2019|202[012])
Match &year_max=
and either 2019 2020 2021 2022)?
Close the non capture group and make it optional$
End of stringAs you have selected Python in the regex101 tool, you don't have to escape the /
Upvotes: 1
Reputation: 351288
You could use a negative look-ahead for the case where the URL-query-parameter is not included:
^(domain\.com)\/(?:.*&year_max=(?:2019|2020|2021|2022)|(?!.*&year_max=)).*$
Upvotes: 1