Sortea2
Sortea2

Reputation: 57

DOM Manipulation with PHP

I would like to make a simple but non trivial manipulation of DOM Elements with PHP but I am lost.

Assume a page like Wikipedia where you have paragraphs and titles (<p>, <h2>). They are siblings. I would like to take both elements, in sequential order.

I have tried GetElementbyName but then you have no possibility to organize information. I have tried DOMXPath->query() but I found it really confusing.

Just parsing something like:

<html>
  <head></head>
  <body>
    <h2>Title1</h2>
    <p>Paragraph1</p>
    <p>Paragraph2</p>
    <h2>Title2</h2>
    <p>Paragraph3</p>
  </body>
</html>

into:

Title1
Paragraph1
Paragraph2
Title2
Paragraph3

With a few bits of HTML code I do not need between all.

Thank you. I hope question does not look like homework.

Upvotes: 1

Views: 1345

Answers (3)

Salman Arshad
Salman Arshad

Reputation: 272106

Try having a look at this library and corresponding project:

Simple HTML DOM

This allows you to open up an online webpage or a html page from filesystem and access its items via class names, tag names and IDs. If you are familiar with jQuery and its syntax you need no time in getting used to this library.

Upvotes: 1

Lee
Lee

Reputation: 20934

I have uased a few times simple html dom by S.C.Chen.

Perfect class for access dom elements.

Example:

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

// Find all links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 

Check it out here. simplehtmldom

May help with future projects

Upvotes: 1

Tomalak
Tomalak

Reputation: 338178

I think DOMXPath->query() is the right approach. This XPath expression will return all nodes that are either a <h2> or a <p> on the same level (since you said they were siblings).

/html/body/*[name() = 'p' or name() = 'h2']

The nodes will be returned as a node list in the right order (document order). You can then construct a foreach loop over the result.

Upvotes: 1

Related Questions