Reputation: 2051
I've been looking for a simple regex for URLs, does anybody have one handy that works well? I didn't find one with the zend framework validation classes and have seen several implementations.
Upvotes: 136
Views: 195040
Reputation: 84593
I used this on a few projects, I don't believe I've run into issues, but I'm sure it's not exhaustive:
$text = preg_replace(
'#((https?|ftp)://(\S*?\.\S*?))([\s)\[\]{},;"\':<]|\.\s|$)#i',
"'<a href=\"$1\" target=\"_blank\">$3</a>$4'",
$text
);
Most of the random junk at the end is to deal with situations like http://domain.example.
in a sentence (to avoid matching the trailing period). I'm sure it could be cleaned up but since it worked. I've more or less just copied it over from project to project.
Upvotes: 88
Reputation: 787
The best URL Regex that worked for me:
function valid_URL($url){
return preg_match('%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu', $url);
}
Examples:
valid_URL('https://twitter.com'); // true
valid_URL('http://twitter.com'); // true
valid_URL('http://twitter.co'); // true
valid_URL('http://t.co'); // true
valid_URL('http://twitter.c'); // false
valid_URL('htt://twitter.com'); // false
valid_URL('http://example.com/?a=1&b=2&c=3'); // true
valid_URL('http://127.0.0.1'); // true
valid_URL(''); // false
valid_URL(1); // false
Source: http://urlregex.com/
Upvotes: 10
Reputation: 942
For anyone developing with WordPress, just use
esc_url_raw($url) === $url
to validate a URL (here's WordPress' documentation on esc_url_raw
). It handles URLs much better than filter_var($url, FILTER_VALIDATE_URL)
because it is unicode and XSS-safe. (Here is a good article mentioning all the problems with filter_var
).
Upvotes: 1
Reputation: 91
"/(http(s?):\/\/)([a-z0-9\-]+\.)+[a-z]{2,4}(\.[a-z]{2,4})*(\/[^ ]+)*/i"
(http(s?)://) means http:// or https://
([a-z0-9-]+.)+ => 2.0[a-z0-9-] means any a-z character or any 0-9 or (-)sign)
2.1 (+) means the character can be one or more ex: a1w,
a9-,c559s, f)
2.2 \. is (.)sign
2.3. the (+) sign after ([a-z0-9\-]+\.) mean do 2.1,2.2,2.3
at least 1 time
ex: abc.defgh0.ig, aa.b.ced.f.gh. also in case www.yyy.com
3.[a-z]{2,4} mean a-z at least 2 character but not more than
4 characters for check that there will not be
the case
ex: https://www.google.co.kr.asdsdagfsdfsf
4.(\.[a-z]{2,4})*(\/[^ ]+)* mean
4.1 \.[a-z]{2,4} means like number 3 but start with
(.)sign
4.2 * means (\.[a-z]{2,4})can be use or not use never mind
4.3 \/ means \
4.4 [^ ] means any character except blank
4.5 (+) means do 4.3,4.4,4.5 at least 1 times
4.6 (*) after (\/[^ ]+) mean use 4.3 - 4.5 or not use
no problem
use for case https://stackoverflow.com/posts/51441301/edit
5. when you use regex write in "/ /" so it come
"/(http(s?)://)([a-z0-9-]+.)+[a-z]{2,4}(.[a-z]{2,4})(/[^ ]+)/i"
6. almost forgot: letter i on the back mean ignore case of
Big letter or small letter ex: A same as a, SoRRy same
as sorry.
Note : Sorry for bad English. My country not use it well.
Upvotes: 2
Reputation: 2950
Here's a simple class for URL Validation using RegEx and then cross-references the domain against popular RBL (Realtime Blackhole Lists) servers:
Install:
require 'URLValidation.php';
Usage:
require 'URLValidation.php';
$urlVal = new UrlValidation(); //Create Object Instance
Add a URL as the parameter of the domain()
method and check the the return.
$urlArray = ['http://www.bokranzr.com/test.php?test=foo&test=dfdf', 'https://en-gb.facebook.com', 'https://www.google.com'];
foreach ($urlArray as $k=>$v) {
echo var_dump($urlVal->domain($v)) . ' URL: ' . $v . '<br>';
}
Output:
bool(false) URL: http://www.bokranzr.com/test.php?test=foo&test=dfdf
bool(true) URL: https://en-gb.facebook.com
bool(true) URL: https://www.google.com
As you can see above, www.bokranzr.com is listed as malicious website via an RBL so the domain was returned as false.
Upvotes: 0
Reputation: 10755
Inspired in this .NET StackOverflow question and in this referenced article from that question there is this URI validator (URI means it validates both URL and URN).
if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
{
throw new \RuntimeException( "URI has not a valid format." );
}
I have successfully unit-tested this function inside a ValueObject I made named Uri
and tested by UriTest
.
<?php
declare( strict_types = 1 );
namespace XaviMontero\ThrasherPortage\Tests\Tour;
use XaviMontero\ThrasherPortage\Tour\Uri;
class UriTest extends \PHPUnit_Framework_TestCase
{
private $sut;
public function testCreationIsOfProperClassWhenUriIsValid()
{
$sut = new Uri( 'http://example.com' );
$this->assertInstanceOf( 'XaviMontero\\ThrasherPortage\\Tour\\Uri', $sut );
}
/**
* @dataProvider urlIsValidProvider
* @dataProvider urnIsValidProvider
*/
public function testGetUriAsStringWhenUriIsValid( string $uri )
{
$sut = new Uri( $uri );
$actual = $sut->getUriAsString();
$this->assertInternalType( 'string', $actual );
$this->assertEquals( $uri, $actual );
}
public function urlIsValidProvider()
{
return
[
[ 'http://example-server' ],
[ 'http://example.com' ],
[ 'http://example.com/' ],
[ 'http://subdomain.example.com/path/?parameter1=value1¶meter2=value2' ],
[ 'random-protocol://example.com' ],
[ 'http://example.com:80' ],
[ 'http://example.com?no-path-separator' ],
[ 'http://example.com/pa%20th/' ],
[ 'ftp://example.org/resource.txt' ],
[ 'file://../../../relative/path/needs/protocol/resource.txt' ],
[ 'http://example.com/#one-fragment' ],
[ 'http://example.edu:8080#one-fragment' ],
];
}
public function urnIsValidProvider()
{
return
[
[ 'urn:isbn:0-486-27557-4' ],
[ 'urn:example:mammal:monotreme:echidna' ],
[ 'urn:mpeg:mpeg7:schema:2001' ],
[ 'urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
[ 'rare-urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
[ 'urn:FOO:a123,456' ]
];
}
/**
* @dataProvider urlIsNotValidProvider
* @dataProvider urnIsNotValidProvider
*/
public function testCreationThrowsExceptionWhenUriIsNotValid( string $uri )
{
$this->expectException( 'RuntimeException' );
$this->sut = new Uri( $uri );
}
public function urlIsNotValidProvider()
{
return
[
[ 'only-text' ],
[ 'http//missing.colon.example.com/path/?parameter1=value1¶meter2=value2' ],
[ 'missing.protocol.example.com/path/' ],
[ 'http://example.com\\bad-separator' ],
[ 'http://example.com|bad-separator' ],
[ 'ht tp://example.com' ],
[ 'http://exampl e.com' ],
[ 'http://example.com/pa th/' ],
[ '../../../relative/path/needs/protocol/resource.txt' ],
[ 'http://example.com/#two-fragments#not-allowed' ],
[ 'http://example.edu:portMustBeANumber#one-fragment' ],
];
}
public function urnIsNotValidProvider()
{
return
[
[ 'urn:mpeg:mpeg7:sch ema:2001' ],
[ 'urn|mpeg:mpeg7:schema:2001' ],
[ 'urn?mpeg:mpeg7:schema:2001' ],
[ 'urn%mpeg:mpeg7:schema:2001' ],
[ 'urn#mpeg:mpeg7:schema:2001' ],
];
}
}
<?php
declare( strict_types = 1 );
namespace XaviMontero\ThrasherPortage\Tour;
class Uri
{
/** @var string */
private $uri;
public function __construct( string $uri )
{
$this->assertUriIsCorrect( $uri );
$this->uri = $uri;
}
public function getUriAsString()
{
return $this->uri;
}
private function assertUriIsCorrect( string $uri )
{
// https://stackoverflow.com/questions/30847/regex-to-validate-uris
// http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/
if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
{
throw new \RuntimeException( "URI has not a valid format." );
}
}
}
There are 65 assertions in 46 tests. Caution: there are 2 data-providers for valid and 2 more for invalid expressions. One is for URLs and the other for URNs. If you are using a version of PhpUnit of v5.6* or earlier then you need to join the two data providers into a single one.
xavi@bromo:~/custom_www/hello-trip/mutant-migrant$ vendor/bin/phpunit
PHPUnit 5.7.3 by Sebastian Bergmann and contributors.
.............................................. 46 / 46 (100%)
Time: 82 ms, Memory: 4.00MB
OK (46 tests, 65 assertions)
There's is 100% of code-coverage in this sample URI checker.
Upvotes: 3
Reputation: 9049
OK, so this is a little bit more complex then a simple regex, but it allows for different types of urls.
Examples:
All which should be marked as valid.
function is_valid_url($url) {
// First check: is the url just a domain name? (allow a slash at the end)
$_domain_regex = "|^[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})/?$|";
if (preg_match($_domain_regex, $url)) {
return true;
}
// Second: Check if it's a url with a scheme and all
$_regex = '#^([a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))$#';
if (preg_match($_regex, $url, $matches)) {
// pull out the domain name, and make sure that the domain is valid.
$_parts = parse_url($url);
if (!in_array($_parts['scheme'], array( 'http', 'https' )))
return false;
// Check the domain using the regex, stops domains like "-example.com" passing through
if (!preg_match($_domain_regex, $_parts['host']))
return false;
// This domain looks pretty valid. Only way to check it now is to download it!
return true;
}
return false;
}
Note that there is a in_array check for the protocols that you want to allow (currently only http and https are in that list).
var_dump(is_valid_url('google.com')); // true
var_dump(is_valid_url('google.com/')); // true
var_dump(is_valid_url('http://google.com')); // true
var_dump(is_valid_url('http://google.com/')); // true
var_dump(is_valid_url('https://google.com')); // true
Upvotes: 1
Reputation: 2703
Use the filter_var()
function to validate whether a string is URL or not:
var_dump(filter_var('example.com', FILTER_VALIDATE_URL));
It is bad practice to use regular expressions when not necessary.
EDIT: Be careful, this solution is not unicode-safe and not XSS-safe. If you need a complex validation, maybe it's better to look somewhere else.
Upvotes: 215
Reputation: 958
There is a PHP native function for that:
$url = 'http://www.yoururl.co.uk/sub1/sub2/?param=1¶m2/';
if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) {
// Wrong
}
else {
// Valid
}
Returns the filtered data, or FALSE if the filter fails.
Upvotes: -1
Reputation: 3750
Here is the way I did it. But I want to mentoin that I am not so shure about the regex. But It should work thou :)
$pattern = "#((http|https)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|”|\"|'|:|\<|$|\.\s)#i";
$text = preg_replace_callback($pattern,function($m){
return "<a href=\"$m[1]\" target=\"_blank\">$m[1]</a>$m[4]";
},
$text);
This way you won't need the eval marker on your pattern.
Hope it helps :)
Upvotes: 0
Reputation: 563
And there is your answer =) Try to break it, you can't!!!
function link_validate_url($text) {
$LINK_DOMAINS = 'aero|arpa|asia|biz|com|cat|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|mobi|local';
$LINK_ICHARS_DOMAIN = (string) html_entity_decode(implode("", array( // @TODO completing letters ...
"æ", // æ
"Æ", // Æ
"À", // À
"à", // à
"Á", // Á
"á", // á
"Â", // Â
"â", // â
"å", // å
"Å", // Å
"ä", // ä
"Ä", // Ä
"Ç", // Ç
"ç", // ç
"Ð", // Ð
"ð", // ð
"È", // È
"è", // è
"É", // É
"é", // é
"Ê", // Ê
"ê", // ê
"Ë", // Ë
"ë", // ë
"Î", // Î
"î", // î
"Ï", // Ï
"ï", // ï
"ø", // ø
"Ø", // Ø
"ö", // ö
"Ö", // Ö
"Ô", // Ô
"ô", // ô
"Õ", // Õ
"õ", // õ
"Œ", // Œ
"œ", // œ
"ü", // ü
"Ü", // Ü
"Ù", // Ù
"ù", // ù
"Û", // Û
"û", // û
"Ÿ", // Ÿ
"ÿ", // ÿ
"Ñ", // Ñ
"ñ", // ñ
"þ", // þ
"Þ", // Þ
"ý", // ý
"Ý", // Ý
"¿", // ¿
)), ENT_QUOTES, 'UTF-8');
$LINK_ICHARS = $LINK_ICHARS_DOMAIN . (string) html_entity_decode(implode("", array(
"ß", // ß
)), ENT_QUOTES, 'UTF-8');
$allowed_protocols = array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal');
// Starting a parenthesis group with (?: means that it is grouped, but is not captured
$protocol = '((?:'. implode("|", $allowed_protocols) .'):\/\/)';
$authentication = "(?:(?:(?:[\w\.\-\+!$&'\(\)*\+,;=" . $LINK_ICHARS . "]|%[0-9a-f]{2})+(?::(?:[\w". $LINK_ICHARS ."\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})*)?)?@)";
$domain = '(?:(?:[a-z0-9' . $LINK_ICHARS_DOMAIN . ']([a-z0-9'. $LINK_ICHARS_DOMAIN . '\-_\[\]])*)(\.(([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_\[\]])+\.)*('. $LINK_DOMAINS .'|[a-z]{2}))?)';
$ipv4 = '(?:[0-9]{1,3}(\.[0-9]{1,3}){3})';
$ipv6 = '(?:[0-9a-fA-F]{1,4}(\:[0-9a-fA-F]{1,4}){7})';
$port = '(?::([0-9]{1,5}))';
// Pattern specific to external links.
$external_pattern = '/^'. $protocol .'?'. $authentication .'?('. $domain .'|'. $ipv4 .'|'. $ipv6 .' |localhost)'. $port .'?';
// Pattern specific to internal links.
$internal_pattern = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]]+)";
$internal_pattern_file = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]\.]+)$/i";
$directories = "(?:\/[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'#!():;*@\[\]]*)*";
// Yes, four backslashes == a single backslash.
$query = "(?:\/?\?([?a-z0-9". $LINK_ICHARS ."+_|\-\.~\/\\\\%=&,$'():;*@\[\]{} ]*))";
$anchor = "(?:#[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'():;*@\[\]\/\?]*)";
// The rest of the path for a standard URL.
$end = $directories .'?'. $query .'?'. $anchor .'?'.'$/i';
$message_id = '[^@].*@'. $domain;
$newsgroup_name = '(?:[0-9a-z+-]*\.)*[0-9a-z+-]*';
$news_pattern = '/^news:('. $newsgroup_name .'|'. $message_id .')$/i';
$user = '[a-zA-Z0-9'. $LINK_ICHARS .'_\-\.\+\^!#\$%&*+\/\=\?\`\|\{\}~\'\[\]]+';
$email_pattern = '/^mailto:'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/';
if (strpos($text, '<front>') === 0) {
return false;
}
if (in_array('mailto', $allowed_protocols) && preg_match($email_pattern, $text)) {
return false;
}
if (in_array('news', $allowed_protocols) && preg_match($news_pattern, $text)) {
return false;
}
if (preg_match($internal_pattern . $end, $text)) {
return false;
}
if (preg_match($external_pattern . $end, $text)) {
return false;
}
if (preg_match($internal_pattern_file, $text)) {
return false;
}
return true;
}
Upvotes: 5
Reputation: 222
function validateURL($URL) {
$pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
$pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){
return true;
} else{
return false;
}
}
Upvotes: 7
Reputation: 9
I've found this to be the most useful for matching a URL..
^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$
Upvotes: -1
Reputation: 197
I don't think that using regular expressions is a smart thing to do in this case. It is impossible to match all of the possibilities and even if you did, there is still a chance that url simply doesn't exist.
Here is a very simple way to test if url actually exists and is readable :
if (preg_match("#^https?://.+#", $link) and @fopen($link,"r")) echo "OK";
(if there is no preg_match
then this would also validate all filenames on your server)
Upvotes: 11
Reputation: 11317
function is_valid_url ($url="") {
if ($url=="") {
$url=$this->url;
}
$url = @parse_url($url);
if ( ! $url) {
return false;
}
$url = array_map('trim', $url);
$url['port'] = (!isset($url['port'])) ? 80 : (int)$url['port'];
$path = (isset($url['path'])) ? $url['path'] : '';
if ($path == '') {
$path = '/';
}
$path .= ( isset ( $url['query'] ) ) ? "?$url[query]" : '';
if ( isset ( $url['host'] ) AND $url['host'] != gethostbyname ( $url['host'] ) ) {
if ( PHP_VERSION >= 5 ) {
$headers = get_headers("$url[scheme]://$url[host]:$url[port]$path");
}
else {
$fp = fsockopen($url['host'], $url['port'], $errno, $errstr, 30);
if ( ! $fp ) {
return false;
}
fputs($fp, "HEAD $path HTTP/1.1\r\nHost: $url[host]\r\n\r\n");
$headers = fread ( $fp, 128 );
fclose ( $fp );
}
$headers = ( is_array ( $headers ) ) ? implode ( "\n", $headers ) : $headers;
return ( bool ) preg_match ( '#^HTTP/.*\s+[(200|301|302)]+\s#i', $headers );
}
return false;
}
Upvotes: 4
Reputation: 5068
As per John Gruber (Daring Fireball):
Regex:
(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))
using in preg_match():
preg_match("/(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $url)
Here is the extended regex pattern (with comments):
(?xi)
\b
( # Capture 1: entire matched URL
(?:
https?:// # http or https protocol
| # or
www\d{0,3}[.] # "www.", "www1.", "www2." … "www999."
| # or
[a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash
)
(?: # One or more:
[^\s()<>]+ # Run of non-space, non-()<>
| # or
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
)+
(?: # End with:
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels
| # or
[^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars
)
)
For more details please look at: http://daringfireball.net/2010/07/improved_regex_for_matching_urls
Upvotes: 16
Reputation: 8586
Just in case you want to know if the url really exists:
function url_exist($url){//se passar a URL existe
$c=curl_init();
curl_setopt($c,CURLOPT_URL,$url);
curl_setopt($c,CURLOPT_HEADER,1);//get the header
curl_setopt($c,CURLOPT_NOBODY,1);//and *only* get the header
curl_setopt($c,CURLOPT_RETURNTRANSFER,1);//get the response as a string from curl_exec(), rather than echoing it
curl_setopt($c,CURLOPT_FRESH_CONNECT,1);//don't use a cached version of the url
if(!curl_exec($c)){
//echo $url.' inexists';
return false;
}else{
//echo $url.' exists';
return true;
}
//$httpcode=curl_getinfo($c,CURLINFO_HTTP_CODE);
//return ($httpcode<400);
}
Upvotes: 13
Reputation: 25165
Edit:
As incidence pointed out this code has been DEPRECATED with the release of PHP 5.3.0 (2009-06-30) and should be used accordingly.
Just my two cents but I've developed this function and have been using it for a while with success. It's well documented and separated so you can easily change it.
// Checks if string is a URL
// @param string $url
// @return bool
function isURL($url = NULL) {
if($url==NULL) return false;
$protocol = '(http://|https://)';
$allowed = '([a-z0-9]([-a-z0-9]*[a-z0-9]+)?)';
$regex = "^". $protocol . // must include the protocol
'(' . $allowed . '{1,63}\.)+'. // 1 or several sub domains with a max of 63 chars
'[a-z]' . '{2,6}'; // followed by a TLD
if(eregi($regex, $url)==true) return true;
else return false;
}
Upvotes: 5
Reputation: 2699
Peter's Regex doesn't look right to me for many reasons. It allows all kinds of special characters in the domain name and doesn't test for much.
Frankie's function looks good to me and you can build a good regex from the components if you don't want a function, like so:
^(http://|https://)(([a-z0-9]([-a-z0-9]*[a-z0-9]+)?){1,63}\.)+[a-z]{2,6}
Untested but I think that should work.
Also, Owen's answer doesn't look 100% either. I took the domain part of the regex and tested it on a Regex tester tool http://erik.eae.net/playground/regexp/regexp.html
I put the following line:
(\S*?\.\S*?)
in the "regexp" section and the following line:
-hello.com
under the "sample text" section.
The result allowed the minus character through. Because \S means any non-space character.
Note the regex from Frankie handles the minus because it has this part for the first character:
[a-z0-9]
Which won't allow the minus or any other special character.
Upvotes: 0
Reputation: 9327
As per the PHP manual - parse_url should not be used to validate a URL.
Unfortunately, it seems that filter_var('example.com', FILTER_VALIDATE_URL)
does not perform any better.
Both parse_url()
and filter_var()
will pass malformed URLs such as http://...
Therefore in this case - regex is the better method.
Upvotes: 33
Reputation: 105914
I've used this one with good success - I don't remember where I got it from
$pattern = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";
Upvotes: 8