Reputation: 39
I have a http://www.statistics.com/index.php?page=glossary&term_id=703
Specifically in these part:
<b>Additive Error:</b>
<p> Additive error is the error that is added to the true value and does not
depend on the true value itself. In other words, the result of the measurement is
considered as a sum of the true value and the additive error: </p>
I tried my best to get the text between the tag <p>
and </p>
, with this:
include('simple_html_dom.php');
$url = 'http://www.statistics.com/index.php?page=glossary&term_id=703';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
$html = new simple_html_dom();
$html->load($curl_scraped_page);
foreach ( $html->find('b') as $e ) {
echo $e->innertext . '<br>';
}
It gives me:
Additive Error:
Browse Other Glossary Entries
I tried to change the foreach to: foreach ( $html->find('b p') as $e ) {
then foreach ( $html->find('/b p') as $e ) {
Then it just keeps giving me nothing but blank page. What did I do wrong? Thanks.
Upvotes: 2
Views: 3942
Reputation: 9646
Try this
<?php
$dom = new DOMDocument();
@$dom->loadHTMLFile('http://www.statistics.com/index.php?page=glossary&term_id=703');
$xpath = new DOMXPath($dom);
$mytext = '';
foreach($xpath->query('//font') as $font){
$mytext = $xpath->query('.//p', $font)->item(0)->nodeValue;
break;
}
echo $mytext;
?>
Upvotes: 0
Reputation: 16709
Why not use PHP's built-in DOM extension and xpath?
libxml_use_internal_errors(true); // <- you might needs this if that page has errors
$dom = new DomDocument();
$dom->loadHtml($curl_scraped_page);
$xpath = new DomXPath($dom);
print $xpath->evaluate('string(//p[preceding::b]/text())');
// ^
// this will get you text content from <p> tags preceded by <b> tags
If there are multiple <p>
tags preceeded by <b>
's, and you want to get just the first one, adjust the xpath query to:
string((//p[preceding::b]/text())[1])
To get them all as a DOMNodeList
, ommit the string()
function: //p[preceding::b]/text()
and then you can iterate over the list and access the textContent
property of each node...
Upvotes: 1
Reputation: 25374
If you want all content which is inside b or p tags, you can simply do foreach ($html->find('b,p') as $e) { ... }
.
Upvotes: 0