Reputation: 31307
I've been given the following regex expression:
if (preg_match_all("'(http://)?(www[.])?(youtube|vimeo)[^\s]+'is",$prova,$n))
{
foreach ($n[3] as $key => $site)
{
$video_links[$site][] = $n[0][$key];
}
However, if I have a string like:
"hello, look at my vimeo video here: http://www.vimeo.com..../ very nice hm?"
Instead of receiving only the url, I'm getting ALSO the word vimeo.
I believe the regex expression is retring more then it should and I would like to retrive ONLY the urls that it finds, not every reference of "vimeo" or "youtube".
Can I request your help in order to narrow the scope of this expression, so that only the URLs are retrieved ?
Upvotes: 0
Views: 510
Reputation: 145482
First question mark ?
in the regex is unneeded. It makes the preceeding search strings optional, thus also match the bare vimeo
word in texts. Try:
preg_match_all("'(http://)(www[.])?(youtube|vimeo)[.][^\s]+'is",
Tip: add (?<![,.)])
at the end if you want to exclude typical interpunction that often screws up such url searches.
As alternative, with http:// and www. optional, but depending on presence of a path:
preg_match_all("'(http://|www[.])*(youtube|vimeo)[.]\w+/[^\s]+'is",
Upvotes: 2
Reputation: 507
Maybe the following code can help out a bit:
<?php
//Test string
$prova = "\"hello, look at my <strong>vimeo</strong> video here: <a href=\"http://www.vimeo.com..../\" rel=\"nofollow\">http://www.vimeo.com..../</a> very nice hm?\"";
$prova .= " vimeo vimeo.com/something?id=somethingcrazy&testing=true ";
//if we match then capture all matches
if (preg_match_all("'(http://)?(www\.)?(youtube|vimeo)\.([a-z0-9_/?&+=.]+)'is",$prova,$n)){
foreach ($n[0] as $key => $site){
//for each match that matched the whole pattern
//save the match as a site
$video_links[$site][] = $n[0][$key];
}
}
//display results
print_r($video_links);
?>
This will not match the word vimeo. It will match vimeo.com/something?id=somethingcrazy&testing=true and it will match http://www.vimeo.com..../ twice.
Upvotes: 1