Fii
Fii

Reputation: 39

PHP Simple HTML DOM Parser, find text inside tags that have no class nor id

I have a http://www.statistics.com/index.php?page=glossary&term_id=703

Specifically in these part:

<b>Additive Error:</b>
<p> Additive error is the error that is added to the true value and does not 
depend on the true value itself. In other words, the result of the measurement is 
considered as a sum of the true value and the additive error:   </p> 

I tried my best to get the text between the tag <p> and </p>, with this:

include('simple_html_dom.php');
$url = 'http://www.statistics.com/index.php?page=glossary&term_id=703';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
$html = new simple_html_dom();
$html->load($curl_scraped_page);

foreach ( $html->find('b') as $e ) {
echo $e->innertext . '<br>';
}

It gives me:

Additive Error:
Browse Other Glossary Entries

I tried to change the foreach to: foreach ( $html->find('b p') as $e ) {

then foreach ( $html->find('/b p') as $e ) {

Then it just keeps giving me nothing but blank page. What did I do wrong? Thanks.

Upvotes: 2

Views: 3942

Answers (3)

Khawer Zeshan
Khawer Zeshan

Reputation: 9646

Try this

<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://www.statistics.com/index.php?page=glossary&term_id=703');
$xpath = new DOMXPath($dom);

$mytext = '';
foreach($xpath->query('//font') as $font){
    $mytext =  $xpath->query('.//p', $font)->item(0)->nodeValue;
    break;
}

echo $mytext;
?>

Upvotes: 0

nice ass
nice ass

Reputation: 16709

Why not use PHP's built-in DOM extension and xpath?

libxml_use_internal_errors(true);  // <- you might needs this if that page has errors
$dom = new DomDocument();
$dom->loadHtml($curl_scraped_page);
$xpath = new DomXPath($dom);
print $xpath->evaluate('string(//p[preceding::b]/text())');
//                             ^
//  this will get you text content from <p> tags preceded by <b> tags

If there are multiple <p> tags preceeded by <b>'s, and you want to get just the first one, adjust the xpath query to:

string((//p[preceding::b]/text())[1])

To get them all as a DOMNodeList, ommit the string() function: //p[preceding::b]/text() and then you can iterate over the list and access the textContent property of each node...

Upvotes: 1

Joel Hinz
Joel Hinz

Reputation: 25374

If you want all content which is inside b or p tags, you can simply do foreach ($html->find('b,p') as $e) { ... }.

Upvotes: 0

Related Questions