Reputation: 1746
In a previous post I've asked for some help on rewriting a regex without negation
Starting regex:
https?:\/\/(?:.(?!https?:\/\/))+$
Ended up with:
https?:[^:]*$
This works fine but i've noticed that in case I will have :
in my URL besides the :
from http\s it will not select.
Here is a string which is not working:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2
You can notice the :query2
How can I modify the second regex listed here so it will select urls which contain :
.
Expected output:
http://websites.com/path/subpath/cc:query2
Also I would like to select everything till the first occurance of ?=param
Input:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param
Output:
http://websites.com/path/subpath/cc:query2/text/
Upvotes: 2
Views: 62
Reputation: 627126
It is a pity that Go regex does not support lookarounds. However, you can obtain the last link with a sort of a trick: match all possible links and other characters greedily and capture the last link with a capturing group:
^(?:https?://|.)*(https?://\S+?)(?:\?=|$)
Together with \S*?
lazy whitespace matching, this also lets capture the link up to the ?=
.
See regex demo and Go demo
var r = regexp.MustCompile(`^(?:https?://|.)*(https?://\S+?)(?:\?=|$)`)
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2", -1)[0][1])
fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param", -1)[0][1])
Results:
"http://websites.com/path/subpath/:query2"
"http://websites.com/path/subpath/cc:query2/text/"
In case there can be spaces in the last link, use just .+?
:
^(?:https?://|.)*(https?://.+?)(?:\?=|$)
Upvotes: 4