trallgorm
trallgorm

Reputation: 253

Regex: Extract everything following 2 characters, or the beginning

I'm trying to extract subdomains+domains from some loosely formatted URLs. Some start with http:// and other do not. I covered the http:// case with the following regex:

(?<=(\/\/))[^\/]*

which matches something like

https://stackoverflow.com/questions/ask

to

stackoverflow.com

which is correct. However now I want it to match the above case AND

stackoverflow.com/questions/ask

to

stackoverflow.com

I'm using some third party tool which doesn't clearly state what they are using for regex parsing. How can this expression be done?

Upvotes: 1

Views: 36

Answers (1)

anubhava
anubhava

Reputation: 785196

If tool is python based then you may use this regex:

(?:(?<=://)|^)[^/:]+(?!.*://)

Negative lookahead (?!.*://) will prevent matching a string that has :// ahead thus avoiding matching https at the start.

RegEx Demo 1

otherwise use:

(?<=:\/\/|^)[^\/:]+(?!.*:\/\/)

RegEx Demo 2

Upvotes: 1

Related Questions