user7342807
user7342807

Reputation: 313

Parsing table content in php/regex and getting result by td

I have a table like this which I spent a full day trying to get the data from:

<table class="table table-condensed">
<tr>
<td>Monthely rent</td>
<td><strong>Fr. 1'950. </strong></td>
</tr>

<tr>
<td>Rooms(s)</td>
<td><strong>3</strong></td>
</tr>

<tr>
<td>Surface</td>
<td><strong>93m2</strong></td>

</tr>

<tr>
<td>Date of Contract</td>
<td><strong>01.04.17</strong></td>
</tr>

</table>

As you can see the data is well organized, and I am trying to get this result:

monthly rent => Fr. 1'950. 
Rooms(s) => 3
Surface => 93m2
Date of Contract => 01.04.17

I have the table contained inside a variable $table and tried to use DOM

$dom = new DOMDocument(); 
$dom->loadHTML($table);
$dom = new \DomXPath($dom);
$result = $dom->query('//table/tr');
return $result; 

But to no avail, is there any easier way to get the contents in php/regex?

Upvotes: 1

Views: 707

Answers (2)

ThW
ThW

Reputation: 19482

You're on the right track with DOM and Xpath. Do not use Regular Expressions to parse HTML/XML. RegEx are for matching text and often used as a part of a parser. But a parser for a format knows about it features - a RegEx does not.

You should keep you variable names a little more clean. Do not assign different types to the same variable in the same context. It only shows that the variable name might be to generic.

DOMXpath::query() allows you to use Xpath expressions, but only expression that return a node list. DOMXpath::evaluate() allows you to fetch scalar values, too.

So you can fetch the tr elements, iterate them and use additional expression to fetch the two values using the tr element as the context.

$document = new \DOMDocument(); 
$document->loadHTML($table);
$xpath = new \DOMXPath($document);

foreach ($xpath->evaluate('//table/tr') as $tr) {
  var_dump(
     $xpath->evaluate('string(td[1])', $tr),
     $xpath->evaluate('string(td[2]/strong)', $tr)
  );
}

Output:

string(13) "Monthely rent"
string(11) "Fr. 1'950. "
string(8) "Rooms(s)"
string(1) "3"
string(7) "Surface"
string(4) "93m2"
string(16) "Date of Contract"
string(8) "01.04.17"

Upvotes: 2

Ray Hunter
Ray Hunter

Reputation: 15547

Try this out:

$dom = new DOMDocument();
$dom->loadHTML($table);
$dom = new \DomXPath($dom);
$result = $dom->query('//table/tr/td/strong');

foreach($result as $item) {
  echo $item->nodeValue . "\n";
}

That will print the element. However, you will probably want to setup your data in a way that you dont have to deal with the html tags like <strong>. You might want to use xml or even json.

Upvotes: 1

Related Questions