Reputation: 1391
I have a database table which consists the following format of data in one column.
<table cellspacing="1" cellpadding="0" border="0" width="395">
<tbody>
<tr>
<td valign="top" width="135">
<p>Calories (kcal)<br>Energy (kj)<br>Fats<br>Carbohydrates<br>Protein<br></p>
</td>
<td valign="top">
<p>178<br>748<br>0 g<br>9.6 g<br>0.1 g<br></p>
</td>
<td valign="top" width="135">
<p>Fiber<br>Sugars<br>Cholesterol<br>Sodium<br>Alcohol<br></p>
</td>
<td valign="top">
<p>0 g<br>-<br>0 mg<br>-<br>26.2 g<br></p>
</td>
</tr>
</tbody>
</table>
I want to make another database which has separate columns for Calories
, Fats
, Carbohydrates
and Protein
.
To separate this data, I need to fetch data from the old database and parse it like this.
$qry = "SELECT * FROM table";
$res = $mysqli->query($qry);
// new dom object
$dom = new DOMDocument();
while ($row = $res->fetch_assoc()) {
$html = @$dom->loadHTML($row['columndata']);
//the table by its tag name
$tables = $dom->getElementsByTagName('table');
$rows = $tables->item(0)->getElementsByTagName('tr');
foreach ($rows as $row)
{
$cols = $row->getElementsByTagName('td');
echo $cols->item(0)->nodeValue.'<br />';
echo $cols->item(1)->nodeValue.'<br />';
}
}
This outputs the following:
Calories (kcal)Energy (kj)FatsCarbohydratesProtein
1787480 g9.6 g0.1 g
I am unable to separate the output string to have correct column values in the new database.
For example, I want to have value 178
in the Calories
column, 0 g
in the Fats
column, etc.
Upvotes: 2
Views: 3526
Reputation: 3476
Try iterating over the child nodes of the P
elements:
foreach ($rows as $row)
{
$paragraphs = $row->getElementsByTagName('p');
//ensure that all the text between <br> is in one text node
$paragraphs->item(0)->normalize();
foreach($paragraphs->item(0)->childNodes as $node) {
if ($node->nodeType == XML_TEXT_NODE) {
echo $node->nodeValue . '<br/>;
}
}
}
It is important to call normalize() on the p
element, to ensure that the texts between br
elements are in one text node each, and not separated, eg <p>Calories (kcal)<br>Energy (kj)<br>...</p>
will have text nodes of Calories (kcal)
and Energy (kj)
, not Cal
, ories (
, kcal)
and so on, which they might be without normalizing.
Upvotes: 2
Reputation: 3401
If you want to get innerHTML of your td element, you may use following construction:
$tdElement = $row->getElementsByTagName('td')->item(0);
$tdElement->ownerDocument->saveHTML( $tdElement );
It should return you inner html of that node as a string.
Upvotes: 2