Reputation: 2107
I have extracted a string value from my sql table and it is like below:
<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
<p><img alt=\"\" src=\"ckeditor/plugins/imageuploader/uploads/986dfdea.png\"
style=\"height:163px; width:650px\" /></p></p>
<p>end of string</p>
I wish to get image name 986dfdea.png inside the html tag (because there's a lot of <p></p>
tags inside the string, and I want to able to know that this tag contains image), and replace the whole tag content by a symbol, like '#image1'.
Eventually it would become this:
<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
#image1
<p>end of string</p>
I'm developing API for mobile apps, but having baby skill on PHP, still can't achieve my goal by referring to these references:
PHP/regex: How to get the string value of HTML tag?
How to extract img src, title and alt from html using php?
Please help.
Upvotes: 1
Views: 953
Reputation: 98961
Yes, you could use a regex and you'd need way less code, but we shouldn't parse html with a regex, so here's what you need:
</p></p>
), so we use
tidy_repair_string
to clean it.DOMXpath()
to query for p
tags with img
tags inside"
and get the image filename with getAttribute("src")
and basename
createTextNode
with the value of image #imagename
replaceChild
to replace the p
with image inside with new createTextNode
created above.!DOCTYPE
, html
and body
tags automatically generated by new DOMDocument();
<?php
$html = <<< EOF
<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
<p><img alt=\"\" src=\"ckeditor/plugins/imageuploader/uploads/986dfdea.png\"
style=\"height:163px; width:650px\" /></p></p>
<p>end of string</p>
EOF;
$html = tidy_repair_string($html,array(
'output-html' => true,
'wrap' => 80,
'show-body-only' => true,
'clean' => true,
'input-encoding' => 'utf8',
'output-encoding' => 'utf8',
));
$dom = new DOMDocument();
$dom->loadHtml($html);
$x = new DOMXpath($dom);
foreach($x->query('//p/img') as $pImg){
//get image name
$imgFileName = basename(str_replace('"', "", $pImg->getAttribute("src")));
$replace = $dom->createTextNode("#$imgFileName");
$pImg->parentNode->replaceChild($replace, $pImg);
# loadHTML causes a !DOCTYPE tag to be added, so remove it:
$dom->removeChild($dom->firstChild);
# it also wraps the code in <html><body></body></html>, so remove that:
$dom->replaceChild($dom->firstChild->firstChild, $dom->firstChild);
echo str_replace(array("<body>", "</body>"), "", $dom->saveHTML());
}
Output:
<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
<p>#986dfdea.png</p>
<p>end of string</p>
Upvotes: 3