Regex: Extract everything following 2 characters, or the beginning

Question

I'm trying to extract subdomains+domains from some loosely formatted URLs. Some start with http:// and other do not. I covered the http:// case with the following regex:

(?<=(\/\/))[^\/]*

which matches something like

https://stackoverflow.com/questions/ask

to

stackoverflow.com

which is correct. However now I want it to match the above case AND

stackoverflow.com/questions/ask

to

stackoverflow.com

I'm using some third party tool which doesn't clearly state what they are using for regex parsing. How can this expression be done?

anubhava · Accepted Answer

If tool is python based then you may use this regex:

(?:(?<=://)|^)[^/:]+(?!.*://)

Negative lookahead (?!.*://) will prevent matching a string that has :// ahead thus avoiding matching https at the start.

RegEx Demo 1

otherwise use:

(?<=:\/\/|^)[^\/:]+(?!.*:\/\/)

RegEx Demo 2

Regex: Extract everything following 2 characters, or the beginning

Answers (1)

Related Questions