Reputation: 331
In a browser I want to figure out what the subdomain and domain name for the page I am on is, minus the top levels like 'com' and '.co.uk'.
Also, if the subdomain is 'www' I don't want a match on that.
Examples:
https://www.voice-1.mozilla.co.uk/folder/index.html
https://www.voice-1.mozilla.org.uk/folder/index.html
http://www.voice-1.mozilla.com/folder/index.html
http://www.voice-1.mozilla.com:8080/folder/index.html
will all have the matches voice-1
and mozilla
It would be nice to not have to maintain top level domains, but maintaining different variations of www
is okay.
So far I've gotten to skip com
and co.uk
but not www
or org.uk
and not anything else before a .
in the file path: regex-test
The regex is now:
/[\w\-]{3,}(?=[.])/g
How to go about to achieve this?
Edit:
Having a step after the regex, trimming away unwanted www
, co
in co.uk
and org
in org.uk
is okay. But I still need to get the top level removed and anything else before a .
in the file-path. Basically grabbing everything between //
and first /
, except top level domain.
Upvotes: 1
Views: 432
Reputation: 3573
I managed to get this. Got rid of www
and index
.
\.([\w\-]{3,})(?=[\.])
If string methods are allowed, you can try something like this.
str = 'https://www.voice-1.mozilla.co.uk/folder/index.html'
arr = str.split('/')
result = arr[2].split('.')
You will get every part separately in result
. You need to check first element (is it www
or not), same for last two elements (check length and content). I don't think there is any pattern you can use here.
Upvotes: 1