Reputation: 2159
How can i detect if there is some image html tag inside a text and extract just the url of the image ?
Eg.
Extract this url :
http://
www.someurl.com/somefileprocessor.php/12345/somedir/somesubdir/someniceimage.j
pg
from this tag (this tag can be inside another bunch of text and/or html)
<img title="Some nice title" border="0"
hspace="0" alt="some useful hint" src="http://
www.someurl.com/somefileprocessor.php/12345/somedir/somesubdir/someniceimage.j
pg" width="629" height="464" />
Thank's in advance Ângelo
Upvotes: 0
Views: 3147
Reputation: 10148
A quick attempt at an <img/>
tag specific regex:
preg_match_all('/<img[^>]*?\s+src\s*=\s*"([^"]+)"[^>]*?>/i', $str, $matches);
Upvotes: 2
Reputation: 2159
Thank's a lot for the awnswers, as i learn some more PHP. I try this quick and dirty way, it also extracts the image url
$imageurl = strstr($title, 'src',FALSE);
$imageurl = strstr($imageurl,'"',FALSE);
$imageurlpos = strpos($imageurl,'"');
$imageurl = substr($imageurl,$imageurlpos+1);
$imageurlpos = strpos($imageurl,'"');
$imageurl = substr($imageurl,0,$imageurlpos);
Upvotes: 0
Reputation: 4373
You can use CRUL
to get content and then extract all img
tags from content.
to get data by curl
:
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
then use regular expression to extract data.
^https?://(?:[a-z\-]+\.)+[a-z]{2,6}(?:/[^/#?]+)+\.(?:jpg|gif|png)$
this helps you to extract all image urls(in img tag or not).
If you need crawler ,you can use PHPCrawl
Upvotes: 1