Reputation: 2337
Follow up of this question Regex to match pattern with subdomain in java
I use the below pattern to match the domain and subdomain
Pattern pattern = Pattern.compile("http://([a-z0-9]*.)example.com");
this pattern matches the following
http://asd.example.com
http://example.example.com
http://www.example.com
but it is not matching
http://example.com
Can any one tell me how to match http://example.com
too?
Upvotes: 0
Views: 2137
Reputation: 10803
You can use this regex pattern to get domains of all urls:
\\p{L}{0,10}(?:://)?[\\p{L}\\.]{1,50}
For example;
Input = http://www.google.com/search?q=a
Output = http://www.google.com
Input = ftp://www.google.com/search?q=a
Output = ftp://www.google.com
Input = www.google.com/search?q=a
Output = www.google.com
Here, \p{L}{0,10} stands for the http, https and ftp parts (there could be some more i don't know), (?:://)? stands for :// part if appears, [\p{L}\.]{1,50} stands for the foo.bar.foo.com part. The rest of the url is cut out.
And here is the java code that accomplises the job:
public static final String DOMAIN_PATTERN = "\\p{L}{0,10}(?:://)?[\\p{L}\\.]{1,50}";
public static String getDomain(String url) {
if (url == null || url.equals("")) {
return "";
}
Pattern p = Pattern.compile(DOMAIN_PATTERN);
Matcher m = p.matcher(url);
if (m.find()) {
return m.group();
}
return "";
}
public static void main(String[] args) {
System.out.println(getDomain("www.google.com/search?q=a"));
}
Output = www.google.com
Finally, if you want to match just "example.com" you can simply add it to the end of the pattern like :
\\p{L}{0,10}(?:://)?[\\p{L}\\.]{0,50}example\\.com
And this will get all of the domains with "example.com":
Input = http://www.foo.bar.example.com/search?q=a
Output = http://www.foo.bar.example.com
Note : Note that \p{Ll} can be used instead of \p{L} because \p{Ll} catches lowercase unicode letters (\p{L} all kind of unicode letters) and urls are constructed of lowercase letters.
Upvotes: 0
Reputation: 46229
Just make the first part optional with a ?
:
Pattern pattern = Pattern.compile("http://([a-z0-9]*\\.)?example\\.com");
Note that .
matches any character, you should use \\.
to match a literal dot.
Upvotes: 1