Reputation: 3618
Code:
$pattern = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$urls = array();
preg_match($pattern, $comment, $urls);
return $urls;
According to an online regex tester, this regex is correct and should be working:
I am outputting the $links array using:
$linkItems = $model->getLinksInComment($model->comments);
//die(print_r($linkItems));
echo '<ul>';
foreach($linkItems as $link) {
echo '<li><a href="'.$link.'">'.$link.'</a></li>';
}
echo '</ul>';
The output looks like the following:
The $model->comments looks like the following:
destined for surplus
RT#83015
RT#83617
http://google.com
https://google.com
non-link
The list generated is only suppose to be links, and there should be no lines that are empty. Is there something wrong with what I did, because the Regex seems to be correct.
Upvotes: 0
Views: 125
Reputation: 4193
If I'm understanding right, you should use preg_match_all
in your getLinksInComment
function instead:
preg_match_all($pattern, $comment, $matches);
if (isset($matches[0])) {
return $matches[0];
}
return array(); #in case there are no matches
preg_match_all
gets all matches in a string (even if the string contains newlines) and puts them into the array you supply as the third argument. However, anything matched by your regex's capture groups (e.g. (http|https|ftp|ftps)
) will also be put into your $matches
array (as $matches[1]
and so on). That's why you want to return just $matches[0]
as your final array of matches.
I just ran this exact code:
$line = "destined for surplus\n
RT#83015\n
RT#83617\n
http://google.com\n
https://google.com\n
non-link";
$pattern = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
preg_match_all($pattern, $line, $matches);
var_dump($matches);
and got this for my output:
array(3) {
[0]=>
array(2) {
[0]=>
string(17) "http://google.com"
[1]=>
string(18) "https://google.com"
}
[1]=>
array(2) {
[0]=>
string(4) "http"
[1]=>
string(5) "https"
}
[2]=>
array(2) {
[0]=>
string(0) ""
[1]=>
string(0) ""
}
}
Upvotes: 1
Reputation: 3780
Your comment is structured as multiple lines, some of which contain the URLs in which you're interested and nothing else. This being the case, you need not use anything remotely resembling that disaster of a regex to try to pick URLs out of the full comment text; you can instead split by newline, and examine each line individually to see whether it contains a URL. You might therefore implement a much more reliable getLinksInComment()
thus:
function getLinksInComment($comment) {
$links = array();
foreach (preg_split('/\r?\n/', $comment) as $line) {
if (!preg_match('/^http/', $line)) { continue; };
array_push($links, $line);
};
return $links;
};
With suitable adjustment to serve as an object method instead of a bare function, this should solve your problem entirely and free you to go about your day.
Upvotes: 0