Reputation: 14123
I've been trying myself, and searching online, to write this regular expression but without success.
I need to validate that a given URL is from a specific domain and a well-formed link (in PHP). For example:
Good Domain: example.com
So good URLs from example.com:
So bad URLs not from example.com:
Some notes: I don't care about "http" verus "https" but if it matters to you assume "http" always The code that will use this regex is PHP so extra points for that.
UPDATE 2010:
Gruber adds a great URL regex:
?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
See his post: An Improved Liberal, Accurate Regex Pattern for Matching URLs
Upvotes: 2
Views: 8786
Reputation: 53310
Perhaps:
^https?://[^/]*blah\.com(|/.*)$
Edit:
Protect against http://editblah.com
^https?://(([^/]*\.)|)blah\.com(|/.*)$
Upvotes: 1
Reputation: 105868
My stab at it
<?php
$pattern = "#^https?://([a-z0-9-]+\.)*blah\.com(/.*)?$#";
$tests = array(
'http://blah.com/so/this/is/good'
, 'http://blah.com/so/this/is/good/index.html'
, 'http://www.blah.com/so/this/is/good/mice.html#anchortag'
, 'http://anysubdomain.blah.com/so/this/is/good/wow.php'
, 'http://anysubdomain.blah.com/so/this/is/good/wow.php?search=doozy'
, 'http://any.sub-domain.blah.com/so/this/is/good/wow.php?search=doozy' // I added this case
, 'http://999.sub-domain.blah.com/so/this/is/good/wow.php?search=doozy' // I added this case
, 'http://obviousexample.com'
, 'http://bbc.co.uk/blah.com/whatever/you/get/the/idea'
, 'http://blah.com.example'
, 'not/even/a/blah.com/url'
);
foreach ( $tests as $test )
{
if ( preg_match( $pattern, $test ) )
{
echo $test, " <strong>matched!</strong><br>";
} else {
echo $test, " <strong>did not match.</strong><br>";
}
}
// Here's another way
echo '<hr>';
foreach ( $tests as $test )
{
if ( $filtered = filter_var( $test, FILTER_VALIDATE_URL ) )
{
$host = parse_url( $filtered, PHP_URL_HOST );
if ( $host && preg_match( "/blah\.com$/", $host ) )
{
echo $filtered, " <strong>matched!</strong><br>";
} else {
echo $filtered, " <strong>did not match.</strong><br>";
}
} else {
echo $test, " <strong>did not match.</strong><br>";
}
}
Upvotes: 5
Reputation: 3292
Do you have to use a regex? PHP has a lot of built in functions for doing this kind of thing.
filter_var($url, FILTER_VALIDATE_URL)
will tell you if a URL is valid, and
$domain = parse_url($url, PHP_URL_HOST);
will tell you the domain it refers to.
It might be clearer and more maintainable than some mad regex.
Upvotes: 7
Reputation: 19651
\b(https?)://([-A-Z0-9]+\.)*blah.com(/[-A-Z0-9+&@#/%=~_|!:,.;]*)?(\?[A-Z0-9+&@#/%=~_|!:,.;]*)?
Upvotes: 0