XXXXX
XXXXX

Reputation: 3

Java Regexp to match domain of url

I would like to use Java regex to match a domain of a url, for example, for www.table.google.com, I would like to get 'google' out of the url, namely, the second last word in this URL string.

Any help will be appreciated !!!

Upvotes: 0

Views: 2054

Answers (3)

linden2015
linden2015

Reputation: 887

My attempt:

(?<scheme>https?:\/\/)?(?<subdomain>\S*?)(?<domainword>[^.\s]+)(?<tld>\.[a-z]+|\.[a-z]{2,3}\.[a-z]{2,3})(?=\/|$)

Demo. Works correctly for:

http://www.foo.stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.com/
http://stackoverflow.com
https://www.stackoverflow.com
www.stackoverflow.com
stackoverflow.com
http://www.stackoverflow.com
http://www.stackoverflow.co.uk
foo.www.stackoverflow.com
foo.www.stackoverflow.co.uk
foo.www.stackoverflow.co.uk/a/b/c

Upvotes: 1

MoMan
MoMan

Reputation: 17

private static final Pattern URL_MATCH_GET_SECOND_AND_LAST = 
        Pattern.compile("www.(.*)//.google.(.*)", Pattern.CASE_INSENSITIVE);

String sURL = "www.table.google.com";

if (URL_MATCH_GET_SECOND_AND_LAST.matcher(sURL).find()){

    Matcher matchURL =  URL_MATCH_GET_SECOND_AND_LAST .matcher(sURL);

    if (matchURL .find()) {
        String sFirst = matchURL.group(1);
        String sSecond= matchURL.group(2);
    }
}

Upvotes: 0

Zabuzard
Zabuzard

Reputation: 25933

It really depends on the complexity of your inputs...

Here is a pretty simple regex:

.+\\.(.+)\\..+

It fetches something that is inside dots \\..

And here are some examples for that pattern: https://regex101.com/r/L52oz6/1. As you can see, it works for simple inputs but not for complex urls.

But why reinventing the wheel, there are plenty of really good libraries that correctly parse any complex url. But sure, for simple inputs a small regex is easily build. So if that does not solve the problem for your inputs then please callback, I will adjust the regex pattern then.


Note that you can also just use simple splitting like:

String[] elements = input.split("\\.");
String secondToLastElement = elements[elements.length - 2];

But don't forget the index-bound checking.


Or if you search for a very quick solution than walk through the input starting from the last position. Work your way through until you found the first dot, continue until the second dot was found. Then extract that part with input.substring(index1, index2);.

There is also already a delegate method for exactly that purpose, namely String#lastIndexOf (see the documentation).

Take a look at this code snippet:

String input = ...
int indexLastDot = input.lastIndexOf('.');
int indexSecondToLastDot = input.lastIndexOf('.', indexLastDot);
String secondToLastWord = input.substring(indexLastDot, indexSecondToLastDot);

Maybe the bounds are off by 1, haven't tested the code, but you get the idea. Also don't forget bound checking.

The advantage of this approach is that it is really fast, it can directly work on the internal structures of Strings without creating copies.

Upvotes: 1

Related Questions