Write regex without negations

Question

In a previous post I've asked for some help on rewriting a regex without negation

Starting regex:

https?:\/\/(?:.(?!https?:\/\/))+$

Ended up with:

https?:[^:]*$

This works fine but i've noticed that in case I will have : in my URL besides the : from http\s it will not select.

Here is a string which is not working:

sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2

You can notice the :query2

How can I modify the second regex listed here so it will select urls which contain :.

Expected output:

http://websites.com/path/subpath/cc:query2

Also I would like to select everything till the first occurance of ?=param

Input: sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param

Output:

http://websites.com/path/subpath/cc:query2/text/

Wiktor Stribiżew · Accepted Answer

It is a pity that Go regex does not support lookarounds. However, you can obtain the last link with a sort of a trick: match all possible links and other characters greedily and capture the last link with a capturing group:

^(?:https?://|.)*(https?://\S+?)(?:\?=|$)

Together with \S*? lazy whitespace matching, this also lets capture the link up to the ?=.

See regex demo and Go demo

var r = regexp.MustCompile(`^(?:https?://|.)*(https?://\S+?)(?:\?=|$)`)
fmt.Printf("%q
", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2", -1)[0][1])
fmt.Printf("%q
", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param", -1)[0][1])

Results:

"http://websites.com/path/subpath/:query2"
"http://websites.com/path/subpath/cc:query2/text/"

In case there can be spaces in the last link, use just .+?:

^(?:https?://|.)*(https?://.+?)(?:\?=|$)

Write regex without negations

Answers (1)

Related Questions