Reputation: 8422
I have found the following pattern which verify an url here :
/\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))/i;
the explanation of this regex from author is :
(?xi)
\b
( # Capture 1: entire matched URL
(?:
https?:// # http or https protocol
| # or
www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
| # or
[a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
)
(?: # One or more:
[^\s()<>]+ # Run of non-space, non-()<>
| # or
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
)+
(?: # End with:
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
| # or
[^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars
)
)
the problem is that if i type www.ab
this regex work and say that is a valid url, what i want is obligatory that this url should have 2 last parts : "foobar" + . + (minimum 2 characters) , so how i can modify this Regex to match what i need ?
Upvotes: 1
Views: 84
Reputation: 328
/\bwww\.\w+\.\w{2,}/
This will match www.any_alfa_numeric_combo.two_or_more_alfa_nemeric
Upvotes: 0
Reputation: 39434
You originally indicated you wanted a regular expression to match a three part URL: www, a domain name, and a minimum 2 character TLD. That would be:
(https?://)?[^.]+\.[^.]+\....*
I am using dots here to handle the situation of numbers and non-Latin characters in the domain and the TLD.
If you want to support one or more sub-domains, we can make that regex more generic. Consider:
(https?://)?([^.]+\.)+...*
This still matches www.ab, though -- that's a "valid" URL per the specification of "domain" + "." + "tld" (minimum of two characters). It also matches www.45, but you didn't stipulate what made a TLD valid.
So ultimately consider following some sage advice:
Which ... regular expressions should you use? That really depends on what you’re trying to do. In many situations, the answer may be to not use any regular expression at all. Simply try to resolve the URL. If it returns valid content, accept it. If you get a 404 or other error, reject it. Ultimately, that’s the only real test to see whether a URL is valid
Upvotes: 3