Reputation: 253
I'm trying to extract subdomains+domains from some loosely formatted URLs. Some start with http:// and other do not. I covered the http:// case with the following regex:
(?<=(\/\/))[^\/]*
which matches something like
https://stackoverflow.com/questions/ask
to
stackoverflow.com
which is correct. However now I want it to match the above case AND
stackoverflow.com/questions/ask
to
stackoverflow.com
I'm using some third party tool which doesn't clearly state what they are using for regex parsing. How can this expression be done?
Upvotes: 1
Views: 36
Reputation: 785196
If tool is python based then you may use this regex:
(?:(?<=://)|^)[^/:]+(?!.*://)
Negative lookahead (?!.*://)
will prevent matching a string that has ://
ahead thus avoiding matching https
at the start.
otherwise use:
(?<=:\/\/|^)[^\/:]+(?!.*:\/\/)
Upvotes: 1