Reputation: 2149
I'm scraping data from a website using Simple HTML DOM parser (http://simplehtmldom.sourceforge.net/)
The HTML is:
<tr class="productListing-odd">
<td align="right" class="productListing-data"> 0 </td>
<td class="productListing-data"> <a href="http://www.spellvault.net/p46563/Liliana-of-the-Veil/product_info.html" onmouseout="hd()" onmouseover="sd('images/101257121.jpg')">Liliana of the Veil</a> <br> </td>
<td align="center" class="productListing-data"> Black </td>
<td align="center" class="productListing-data"> Mythic </td>
<td align="center" class="productListing-data"> Innistrad </td>
<td align="right" class="productListing-data">€42,50 </td>
<td align="center" class="productListing-data"><input type="text" name="var[46563]" value="" size="4"> <span class="nowrap"><span class="template-button-left"> </span><span class="template-button-middle"><input class="submitButton" type="submit" value="Bestel"></span><span class="template-button-right"> </span></span> </td>
</tr>
And the php:
include_once('simple_html_dom.php');
$html = file_get_html('-the url of the search query on the website-');
$array = array();
foreach($html->find('.productListing-odd, .productListing-even') as $element) {
$row = array(
'name' => strip_tags($element->childNodes(1)->innertext),
'set' => strip_tags($element->childNodes(4)->innertext),
'price' => strip_tags($element->childNodes(5)->innertext),
'stock' => strip_tags($element->childNodes(0)->innertext)
);
array_push($array, $row);
}
echo json_encode($array);
For some reason, the value of 'price' keeps returning NULL. All the other values are collected properly. I can't figure out why this is happening, since the elements all seem to have the same structure.
Thanks in advance!
Upvotes: 0
Views: 1849
Reputation: 7195
Most likely that HTML you parsed has non-unicode charset. And this is a problem since json_encode()
works only with UTF-8 encoding.
Almost all the data you parsed has ASCII characters so it doesn't lead to any problem. But price data (6th column) contains non-ASCII character '€' on which json_encode()
fails (and return null).
Upvotes: 3