Reputation: 116
i wrote regex to validate url, it works fine for most of url i for below url it does not works:
my regex :
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([?=\/\w \.-]*)\/?$/
https://www.facebook.com/permalink.php?story_fbid=802451379821615&id=298161773583914&pnref=story
how to make it work for all urls
Upvotes: 1
Views: 729
Reputation: 20286
It does not make sense to write REGEX just use
filter_var($url, FILTER_VALIDATE_URL);
Validates value as URL (according to » http://www.faqs.org/rfcs/rfc2396), optionally with required components. Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:. Note that the function will only find ASCII URLs to be valid; internationalized domain names (containing non-ASCII characters) will fail.
For JavaScript check
https://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/uri.js
Upvotes: 0
Reputation: 18807
Your regex must handle the following cases with ipv6, utf8 characters...
ipv4
http://192.168.1.1/test.htm
ipv6
http://[2a00:1450:4007:806::1007]/!voilà
international characters
http://bébé.fr/
Many complexes possibilities, in fact, so a better thing would be to test the protocol and the hostname:
if (preg_match("#^(https?)://([^/]+)/#", $url, $out)) {
if (gethostbyname($out[2])) {
return 1;
}
}
return 0;
Or a far more simple solution is to not use regular expression and use PHP parse_url()
functions which handle all cases.
Upvotes: 0
Reputation: 76
My understanding is that catering for every possible valid URL leads to major headaches. However, based on this resource, there is a pretty simple regex that should handle most edge cases.
Try this by @stephenhay, it works for your example anyway:
^(https?|ftp):\/\/[^\s/$.?#].[^\s]*$
Upvotes: 2