Reputation: 33
I found this great URL matching regexp
from another answer here that gets URL's in a string, but it only works if they are followed by spaces.
preg_replace('#(https?|ftp)://[^ ]+ #i', '', $s['Text']);
How would I modify this so that it will also match URL's that are at the very end of a string with nothing after them?
Upvotes: 3
Views: 168
Reputation: 7791
For matching with all kind of URLs the following code can help you:
<?php
$content = '<html>
<title>Random Website I am Crawling</title>
<body>
Click <a href="http://clicklink.com">here</a> for foobar
Another site is http://foobar.com';
$regex = "((https?|ftp)\:\/\/)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass
$regex .= "([a-z0-9-.]*)\.([a-z]{2,4})"; // Host or IP
$regex .= "(\:[0-9]{2,5})?"; // Port
$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor
$matches = array(); //create array
$pattern = "/$regex/";
preg_match_all($pattern, $content, $matches);
print_r(array_values(array_unique($matches[0])));
echo "<br><br>";
echo implode("<br>", array_values(array_unique($matches[0])));
?>
Upvotes: 1