khalid
khalid

Reputation: 377

Domdocument saveHTML() adding extra quotes and some other url encoded characters

I have been using PHP's Domdocument extension for finding image tags with no alt attribute or with empty alt attribute. Here is the html code which I am using for testing purposes:

<span style="font-weight:bold;">Blender</span> is an Open Source 3D modelling and animation software. 
This is a very popular software among hobbyists.<i>Blender</i> has a vast list of features which include bones and meshing, textures, particle physics etc.
<u>Blender</u> was originally a proprietary software which was eventually made opensource. 
Blender is known to be difficult to learn because its interface is very intimiding to a newbie. 
But on the other hand, <a href="http://www.blender.org">Blender</a> is so much customizable that you can actually modify your workspace according to your personal preference. 
Also blender interface has been developed in the OpenGL graphics library, so blender looks all the same on all platforms whether you use Windows, Linux, BSD or even Mac. 
3D is a very interesting field to work with but 3D is somewhat tough to start with. You can <a href="http://www.google.com"" target="_blank">Google</a> for numerous tutorials on Blender. 
There are quite some awesome websites dedicated to blender development, such as BlenderGuru.com. <img src="http://www.cochinsquare.com/wp-content/uploads/2010/08/Blender.jpg">

And here is the Domdocument code which I was using for searching the IMG tag and adding an alt attribute to it .

$dom=new DOMDocument();
$dom->loadHTML($content);
$dom->formatOutput = true;
$imgs = $dom->getElementsByTagName("img");
foreach($imgs as $img){
 $alt = $img->getAttribute('alt');
 if ($alt == ''){
  $k_alt = $this->keyword;    
 }else{
  $k_alt = $alt;
 }
 $img->setAttribute( 'alt' , $k_alt );
}
$html_mod = preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML()));
return $html_mod;

And here is the html in return which I get.

<span style='"font-weight:bold;"'>Blender</span> is an Open Source 3D modelling and animation software. 
This is a very popular software among hobbyists.<i>Blender</i> has a vast list of features which include bones and meshing, textures, particle physics etc.
<u>Blender</u> was originally a proprietary software which was eventually made opensource. 
Blender is known to be difficult to learn because its interface is very intimiding to a newbie. 
But on the other hand, <a href=""http://www.blender.org"">Blender</a> is so much customizable that you can actually modify your workspace according to your personal preference. 
Also blender interface has been developed in the OpenGL graphics library, so blender looks all the same on all platforms whether you use Windows, Linux, BSD or even Mac. 
3D is a very interesting field to work with but 3D is somewhat tough to start with. You can <a href=""http://www.google.com""" target='"_blank"'>Google</a> for numerous tutorials on Blender. 
There are quite some awesome websites dedicated to blender development, such as BlenderGuru.com. 
<img src=""http://www.cochinsquare.com/wp-content/uploads/2010/08/Blender.jpg"" alt="Blender">

Observe the extra quotations (Single as well as Double) in the img src and the anchor tags and in the style attribute of span.

Please help! I want the html to be returned intact with only the new alt attribute added.

Also I would like to mention that I am using PHP 5.3.2 with Suhosin Patch on Ubuntu 10.04

Upvotes: 2

Views: 1031

Answers (1)

I finally figured out how to solve this problem and want to share my solution with you.

To avoid adding quotes after saveHtml you should use html_entity_decode on result of saveHTML function for example:

$filecontent = file_get_contents('file.html');
$doc = new DOMDocument();
$doc->loadHTML($filecontent);
$xpath = new DOMXpath($doc);
$xpath->query("//*[id='bg']")[0]->nodeValue = 'asd';
$filecontent = html_entity_decode($doc->saveHTML());
file_put_contents('file.html', $file_contents);

So you'll get nice right html code in $filecontent variable without excess quotes You are welcome!

Upvotes: 1

Related Questions