Reputation: 1104
I have a string that conaint URLs and other texts. I want to get all the URLs in to $matches
array. But the following code wouldn't get all the URLs in to $matches
array:
$matches = array();
$text = "soundfly.us schoollife.edu hello.net some random news.yahoo.com text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988 and others will en.wikipedia.org/wiki/Country_music URL";
preg_match_all('$\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]$i', $text, $matches);
print_r($matches);
Above code will get:
http://tinyurl.com/9uxdwc
http://google.com
http://tinyurl.com/787988
.
but misses the following 4 URLs:
schoollife.edu
hello.net
news.yahoo.com
en.wikipedia.org/wiki/Country_music
Can you please tell me with an example, how can I modify above code to get all the URLs
Upvotes: 3
Views: 163
Reputation: 1321
Is this what you need?
$matches = array();
$text = "soundfly.us schoollife.edu hello.net some random news.yahoo.com text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988 and others will en.wikipedia.org/wiki/Country_music URL";
preg_match_all('$\b((https?|ftp|file)://)?[-A-Z0-9+&@#/%?=~_|!:,.;]*\.[-A-Z0-9+&@#/%=~_|]+$i', $text, $matches);
print_r($matches);
I made the protocol part optionnal, add the use of a dot spliting the domain and the TLD and a "+" to get the full string after that dot (TLD + extra informations)
Result is:
[0] => soundfly.us
[1] => schoollife.edu
[2] => hello.net
[3] => news.yahoo.com
[4] => http://tinyurl.com/9uxdwc
[5] => http://google.com
[6] => http://tinyurl.com/787988
[7] => en.wikipedia.org/wiki/Country_music
Also works with IP address because of the mandatory presence of a dot into. Tested with string "192.168.0.1" and "192.168.0.1/test/index.php"
Upvotes: 1