user11322373
user11322373

Reputation:

regex to extract link from url

x='http://example.bol.com/click/click?p=1&t=url&s=IDHERE&url=https://www.bol.com/nl/p/jbl-e55bt-draadloze-over-ear-koptelefoon-zwart/9200000064299118&f=TXL&name=/koptelefoon/'

x1='https://example.net/click/camref:IDhere/destination:https://www.mywebsite.com/product/138/sony-ps4.html&q=electronics'

x2='https://example.hn/clickbtn/camref:IDhere/creativeref:IDHERE/destination:https://www.coolblue.nl/product/465/sony-ps4-zwart'

my regex so far https?:\/\/www.(?:mywebsite|coolblue|bol)\.(?:com|nl)(?:\/|\?).*?(?:\.html|\.php|\&)

I have 2 small issues, how can I make so the patter doesn't include "&" (stops right before first &), and capture x2 link

Upvotes: 2

Views: 50

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

To get all the matches from the example data you might use a negated character class [^&\s]* to match any char except & or a whitespace char after matching / or ?

https?:\/\/www\.(?:mywebsite|coolblue|bol)\.(?:com|nl)[\/?][^&\s]*

Explanation

  • https?:\/\/www\. Match the protocol with an optional s and a mandatory www. part
  • (?:mywebsite|coolblue|bol) Match one of the alternatives
  • \.(?:com|nl) Match a dot (Note to escape the dot) and either com or nl
  • [/?] Match either a / or ?
  • [^&\s]* Match 0 or more occurrences of any char except & or a whitespace char

Regex demo

Upvotes: 0

Poul Bak
Poul Bak

Reputation: 10929

Here's the changed regex:

https?:\/\/www.(?:mywebsite|coolblue|bol)\.(?:com|nl)(?:\/|\?).*?(?=&|') 

I first removed the last part of your regex, since it's not needed.

I then added :

(?=&|') 

This is a so called positive look ahead- starts with (?= and ends with )

What it does is that looks forward (to the right) to match what's in the parentes, here the ampersand (&) OR the single quote ('). However it does NOT add this match to the final regex match, it only 'looks'.

There is a lot of posts here on regex to lookup more info on 'positive look ahead'.

Upvotes: 1

Related Questions