Reputation: 5969
$bits = preg_split('#((?:https?|ftp)://[^\s\'"<>()]+)#S', $token->data, -1, PREG_SPLIT_DELIM_CAPTURE);
Say,I'm trying to match urls that need to be linkified.The above is too permissive.
I want to only match simple urls like http://google.com
, but not <a href="http://google.com">http://google.com</a>
, or <iframe src="http://google.com"></iframe>
Upvotes: 1
Views: 222
Reputation: 7413
More effective RE
[hf]t{1,2}p:\/\/[a-zA-Z0-9\.\-]*
Result
Array
(
[0] => Array
(
[0] => ftp://article-stack.com
[1] => http://google.com
)
)
Upvotes: 0
Reputation: 7413
RE
http:\/\/[a-zA-Z0-9\.\-]*
Result
Array
(
[0] => http://google.com
)
Upvotes: 0
Reputation: 398
try this...
function validUrl($url){
$return=FALSE;
$matches=FALSE;
$regex='#(^'; #match[1]
$regex.='((https?|ftps?)+://)?'; #Scheme match[2]
$regex.='(([0-9a-z-]+\.)+'; #Domain match[5] complete match[4]
$regex.='([a-z]{2,3}|aero|coop|jobs|mobi|museum|name|travel))'; #TLD match[6]
$regex.='(:[0-9]{1,5})?'; #Port match[7]
$regex.='(\/[^ ]*)?'; #Query match[8]
$regex.='$)#i';
if( preg_match($regex,$url,$matches) ){
$return=$matches[0]; $domain=$matches[4];
if(!gethostbyname($domain)){
$return = FALSE;
}
}
if($return==FALSE){
return FALSE;
}
else{
return $matches;
}
}
Upvotes: 0
Reputation: 31339
It appears that you're trying to parse HTML using regular expressions. You might want to rethink that.
Upvotes: 2