Reputation:
x='http://example.bol.com/click/click?p=1&t=url&s=IDHERE&url=https://www.bol.com/nl/p/jbl-e55bt-draadloze-over-ear-koptelefoon-zwart/9200000064299118&f=TXL&name=/koptelefoon/'
x1='https://example.net/click/camref:IDhere/destination:https://www.mywebsite.com/product/138/sony-ps4.html&q=electronics'
x2='https://example.hn/clickbtn/camref:IDhere/creativeref:IDHERE/destination:https://www.coolblue.nl/product/465/sony-ps4-zwart'
my regex so far https?:\/\/www.(?:mywebsite|coolblue|bol)\.(?:com|nl)(?:\/|\?).*?(?:\.html|\.php|\&)
I have 2 small issues, how can I make so the patter doesn't include "&" (stops right before first &), and capture x2
link
Upvotes: 2
Views: 50
Reputation: 163207
To get all the matches from the example data you might use a negated character class [^&\s]*
to match any char except &
or a whitespace char after matching /
or ?
https?:\/\/www\.(?:mywebsite|coolblue|bol)\.(?:com|nl)[\/?][^&\s]*
Explanation
https?:\/\/www\.
Match the protocol with an optional s and a mandatory www.
part(?:mywebsite|coolblue|bol)
Match one of the alternatives\.(?:com|nl)
Match a dot (Note to escape the dot) and either com or nl[/?]
Match either a /
or ?
[^&\s]*
Match 0 or more occurrences of any char except &
or a whitespace charUpvotes: 0
Reputation: 10929
Here's the changed regex:
https?:\/\/www.(?:mywebsite|coolblue|bol)\.(?:com|nl)(?:\/|\?).*?(?=&|')
I first removed the last part of your regex, since it's not needed.
I then added :
(?=&|')
This is a so called positive look ahead
- starts with (?=
and ends with )
What it does is that looks forward (to the right) to match what's in the parentes, here the ampersand (&)
OR the single quote (')
. However it does NOT add this match to the final regex match, it only 'looks'.
There is a lot of posts here on regex to lookup more info on 'positive look ahead'.
Upvotes: 1