Reputation: 313
I have a table like this which I spent a full day trying to get the data from:
<table class="table table-condensed">
<tr>
<td>Monthely rent</td>
<td><strong>Fr. 1'950. </strong></td>
</tr>
<tr>
<td>Rooms(s)</td>
<td><strong>3</strong></td>
</tr>
<tr>
<td>Surface</td>
<td><strong>93m2</strong></td>
</tr>
<tr>
<td>Date of Contract</td>
<td><strong>01.04.17</strong></td>
</tr>
</table>
As you can see the data is well organized, and I am trying to get this result:
monthly rent => Fr. 1'950.
Rooms(s) => 3
Surface => 93m2
Date of Contract => 01.04.17
I have the table contained inside a variable $table
and tried to use DOM
$dom = new DOMDocument();
$dom->loadHTML($table);
$dom = new \DomXPath($dom);
$result = $dom->query('//table/tr');
return $result;
But to no avail, is there any easier way to get the contents in php/regex?
Upvotes: 1
Views: 707
Reputation: 19482
You're on the right track with DOM and Xpath. Do not use Regular Expressions to parse HTML/XML. RegEx are for matching text and often used as a part of a parser. But a parser for a format knows about it features - a RegEx does not.
You should keep you variable names a little more clean. Do not assign different types to the same variable in the same context. It only shows that the variable name might be to generic.
DOMXpath::query()
allows you to use Xpath expressions, but only expression that return a node list. DOMXpath::evaluate()
allows you to fetch scalar values, too.
So you can fetch the tr
elements, iterate them and use additional expression to fetch the two values using the tr
element as the context.
$document = new \DOMDocument();
$document->loadHTML($table);
$xpath = new \DOMXPath($document);
foreach ($xpath->evaluate('//table/tr') as $tr) {
var_dump(
$xpath->evaluate('string(td[1])', $tr),
$xpath->evaluate('string(td[2]/strong)', $tr)
);
}
Output:
string(13) "Monthely rent"
string(11) "Fr. 1'950. "
string(8) "Rooms(s)"
string(1) "3"
string(7) "Surface"
string(4) "93m2"
string(16) "Date of Contract"
string(8) "01.04.17"
Upvotes: 2
Reputation: 15547
Try this out:
$dom = new DOMDocument();
$dom->loadHTML($table);
$dom = new \DomXPath($dom);
$result = $dom->query('//table/tr/td/strong');
foreach($result as $item) {
echo $item->nodeValue . "\n";
}
That will print the element. However, you will probably want to setup your data in a way that you dont have to deal with the html tags like <strong>
. You might want to use xml or even json.
Upvotes: 1