Reputation: 621
I need a fast HTML parser, written in php. First I've tried some existing parsers (like Ganon or QueryPath) but they were very slow for my project. Finally I've decided to use the php built-in DOMDocument, being the fastest of all. It has just some bare methods. So I had to start to build my own.
I'm writing a class thats extends DOMElement. New methods like 'addText' are working fine but I have a problem when I want to change the tag name.
In order to change the tag name, the node has to be replaced. It is another node. After this any further actions will not affect the node anymore.
UPDATE: For now, I've added a return $newNode;
in the newTag method and I'm using it like this: $node = $node->newTag('h1');
but for consistency I would really like to use just: $node->newTag('h1');
Please see the code (simplified):
<?php
class my_element extends DOMElement {
public function __construct() { parent::__construct();}
public function newTag($newTagName) {
$newNode = $this->ownerDocument->createElement($newTagName);
$this->parentNode->replaceChild($newNode, $this);
foreach ($this->attributes as $attribute) {
$newNode->setAttribute($attribute->name, $attribute->value);
}
foreach (iterator_to_array($this->childNodes) as $child) {
$newNode->appendChild($this->removeChild($child));
}
//at this point, $newnode should become $this... How???
}
//append plain text
public function addText ($text = '') {
$textNode = $this->ownerDocument->createTextNode($text);
$this->appendChild($textNode);
}
//... some other methods
}
$html = '<div><p></p></div>';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xPath = new DOMXPath($dom);
$dom->registerNodeClass("DOMElement", "my_element"); //extend DOMElement class
$nodes = $xPath->query('//p'); //select all 'p' nodes
$node = $nodes->item(0); // get the first
//Start to change the selected node
$node->addText('123');
$node->newTag('h1');
$node->addText('345'); //This is not working because the node has changed!
echo $dom->saveHTML();
This code will output <div><h1>123</h1></div>
As you can see, the text 345
was not added after I have changed the tag name.
What can be done in order to continue to work with the selected node? Is it possible to set the new node as the current node in the 'newTag' method?
Upvotes: 0
Views: 356
Reputation: 5463
The ideal solution would be DOMDocument::renameNode()
, but it isn't available in PHP yet.
Perhaps this would work instead, called as $node = $node->parentNode->renameChild($node, 'h1')
:
<?php
class MyDOMNode extends DOMNode {
public function renameChild($node, $name) {
$newNode = $this->ownerDocument->createElement($name);
foreach ($node->attributes as $attribute) {
$newNode->setAttribute($attribute->name, $attribute->value);
}
while ($node->firstChild) {
$newNode->appendChild($node->firstChild);
}
$this->replaceChild($newNode, $node);
return $newNode;
}
}
Upvotes: 1