Reputation: 3909
I'm trying to parse an HTML snippet, using the PHP DOM functions. I have stripped out everything apart from paragraph, span and line break tags, and now I want to retrieve all the text, along with its accompanying styles.
So, I'd like to get each piece of text, one by one, and for each one I can then go back up the tree to get the values of particular attributes (I'm only interested in some specific ones, like color etc.).
How can I do this? Or am I thinking about it the wrong way?
Upvotes: 5
Views: 1729
Reputation: 342635
For those who are more comfortable with CSS3 selectors, and are willing to include a single extra PHP class into their project, I would suggest the use of Simple PHP DOM parser. The solution would look something like the following:
$html = file_get_html('http://www.example.com/');
$ret = $html->find('p, span');
$store = array();
foreach($ret as $element) {
$store[] = array($element->tag => array('text' => $element->innertext,
'color' => $element->color,
'style' => $element->style));
}
print_r($store);
Upvotes: 3
Reputation: 11354
Suppose you have a DOMDocument here:
$doc = new DOMDocument();
$doc->loadHTMLFile('http://stackoverflow.com/');
You can find all text nodes using a simple Xpath.
$xpath = new DOMXpath($doc);
$textNodes = $xpath->query('//text()');
Just foreach
over it to iterate over all textnodes:
foreach ($textNodes as $textNode) {
echo $textNode->data . "\n";
}
From that, you can go up the DOM tree by using ->parentNode
.
Hope that this can give you a good start.
Upvotes: 10