Reputation: 78
<div id="productDetails" class="tabContent active details">
<span>
<b>Case Size:</b>
</span>
44mm
<br>
<span>
<b>Case Thickness:</b>
</span>
13mm
<br>
<span>
<b>Water Resistant:</b>
</span>
5 ATM
<br>
<span>
<b>Brand:</b>
</span>
Fossil
<br>
<span>
<b>Warranty:</b>
</span>
11-year limited
<br>
<span>
<b>Origin:</b>
</span>
Imported
<br>
</div>
How can I get data like 44mm, fossil, etc. by DOM parser in PHP?
the data i can get easily by
$data=$html->find('div#productDetails',0)->innertext;
var_dump($data);
but i want to break it in meta_key and meta_value for my sql table...
i can get the meta_key
by
$meta_key=$html->find('div#productDetails span',0)->innertext;
but the meta value related to it????
Upvotes: 1
Views: 1636
Reputation: 76395
It's not that hard, really... just google, and click this link, you now know how to parse a DOM, here you can see what methods you can use to select all elements of interest, iterate the DOM, get its contents and what have you...
$DOM = new DOMDocument();
$DOM->loadHTML($htmlString);
$spans = $DOM->getElementsByTagName('span');
for ($i=0, $j = count($spans); $i < $j; $i++)
{
echo $spans[$i]->childNodes[0]->nodeValue.' - '.$spans[$i]->parentNode->nodeValue."\n";
}
That seems to be what you're after, if I'm not mistaken. This is just off the top of my head, but I think this should output something like:
Case Size: - 44mm
Case Thickness: - 13mm
UPDATE:
Here's a tested solution, that returns the desired result, if I'm not mistaken:
$str = "<div id='productDetails' class='tabContent active details'>
<span>
<b>Case Size:</b>
</span>
44mm
<br>
<span>
<b>Case Thickness:</b>
</span>
13mm
<br>
<span>
<b>Water Resistant:</b>
</span>
5 ATM
<br>
<span>
<b>Brand:</b>
</span>
Fossil
<br>
<span>
<b>Warranty:</b>
</span>
11-year limited
<br>
<span>
<b>Origin:</b>
</span>
Imported
<br>
</div>";
$DOM = new DOMDocument();
$DOM->loadHTML($str);
$txt = implode('',explode("\n",$DOM->textContent));
preg_match_all('/([a-z0-9].*?\:).*?([0-9a-z]+)/im',$txt,$matches);
//or if you don't want to include the colon in your match:
preg_match_all('/([a-z0-9][^:]*).*?([0-9a-z]+)/im',$txt,$matches);
for($i = 0, $j = count($matches[1]);$i<$j;$i++)
{
$matches[1][$i] = preg_replace('/\s+/',' ',$matches[1][$i]);
$matches[2][$i] = preg_replace('/\s+/',' ',$matches[2][$i]);
}
$result = array_combine($matches[1],$matches[2]);
var_dump($result);
//result:
array(6) {
["Case Size:"]=> "44mm"
["Case Thickness:"]=> "13mm"
["Water Resistant:"]=> "5"
["ATM Brand:"]=> "Fossil"
["Warranty:"]=> "11"
["year limited Origin:"]=> "Imported"
}
To insert this in your DB:
foreach($result as $key => $value)
{
$stmt = $pdo->prepare('INSERT INTO your_db.your_table (meta_key, meta_value) VALUES (:key, :value)');
$stmt->execute(array('key' => $key, 'value' => $value);
}
Edit
To capture the 11-year limit
substring entirely, you'll need to edit the code above like so:
//replace $txt = implode('',explode("\n",$DOM->textContent));etc... by:
$txt = $DOM->textContent;//leave line-feeds
preg_match_all('/([a-z0-9][^:]*)[^a-z0-9]*([a-z0-9][^\n]+)/im',$txt,$matches);
for($i = 0, $j = count($matches[1]);$i<$j;$i++)
{
$matches[1][$i] = preg_replace('/\s+/',' ',$matches[1][$i]);
$matches[2][$i] = preg_replace('/\s+/',' ',$matches[2][$i]);
}
$matches[2] = array_map('trim',$matches[2]);//remove trailing spaces
$result = array_combine($matches[1],$matches[2]);
var_dump($result);
The output is:
array(6) {
["Case Size"]=> "44mm"
["Case Thickness"]=> "13mm"
["Water Resistant"]=> "5 ATM"
["Brand"]=> "Fossil"
["Warranty"]=> "11-year limited"
["Origin"]=> "Imported"
}
Upvotes: 1
Reputation: 1198
You can remove the span tag using the set_callback Api
Try this
$url = "";
$html = new simple_html_dom();
$html->load_file($url);
$html->set_callback('my_callback');
$elem = $html->find('div[id=productDetails]');
$product_details = array();
$attrib = array( 1 => 'size', 2 => 'thickness', 3 => 'wr', 4 => 'brand', 5 => 'warranty', 6 => 'orgin' );
$attrib_string = strip_tags($elem[0]->innertext);
$attrib_arr = explode(' ',$attrib_string); // hope this can help you for every product
// Remove Empty Values
$attrib_arr = array_filter($attrib_arr);
$i = 1;
foreach($attrib_arr as $temp)
{
$product_details[$attrib[$i]] = $temp;
$i++;
}
print_r($product_details);
// remove span tag inside div
function my_callback($element) {
if($element->tag == 'span'){ $element->outertext = ""; }
}
Upvotes: 0