Masteryogurt
Masteryogurt

Reputation: 175

php curl/xpath data based off < p> text information?

I know how to xpath and echo text off another website via tags like div id, class ,etc, using the below code. But, I don't know how to do it under more precise conditions, for example when trying to scrape and echo a bit of text that has no unique tag identifier like a div. This below code spits out scraped data.

$doc = new DOMDocument;

// We don't want to bother with white spaces
$doc->preserveWhiteSpace = false;

// Most HTML Developers are chimps and produce invalid markup...
$doc->strictErrorChecking = false;
$doc->recover = true;

$doc->loadHTMLFile('http://www.nbcnews.com/business');

$xpath = new DOMXPath($doc);

$query = "//div[@class='market']";

$entries = $xpath->query($query);
foreach ($entries as $entry) {
echo trim($entry->textContent);  // use `trim` to eliminate spaces
}

In this below source code for an example, I want to pull the value "21,271.97". But there's no unique tag for this, no div id. Is it possible to pull this data by identifying a keyword in the < p> that never changes, for example "DJIA all time".

<p>DJIA All Time, Record-High Close: <font color="#0000FF">June 9, 
2017</font> 
(<font color="#FF0000"><b bgcolor="#FFFFCC"><font face="Verdana, Arial, 
Helvetica, sans-serif" size="2">21,271.97</font></b></font>)</p>

Wondering if I could possibly replace this with something around the lines of $query = "//div[@class='market']"; $query = "//p['DJIA all time']";

Could this be possible?

I also wonder if using a loop with something like $query = "//p[='DJIA']";? could work, though I don't know how to use that exactly. Thanks!!

Upvotes: 0

Views: 170

Answers (2)

Andersson
Andersson

Reputation: 52685

Try to use below XPath expression:

//p[contains(text(), "DJIA All Time")]//b/font

Considering provided link (http://www.nbcnews.com/business) you can get required text with

//span[text()="DJIA"]/following-sibling::span[@class="market_item market_price"]

Upvotes: 1

Nigel Ren
Nigel Ren

Reputation: 57131

It would be good to have a play with an online XPath tester - I use https://www.freeformatter.com/xpath-tester.html#ad-output

$query = "//p[contains(text(),'DJIA')]";

Although if you use the page your after, I've found that the value seems to be the first record for...

$query = "//span[contains(@class,'market_price')]";

But the idea is the same in both cases, using contains(source,value) will match a set of nodes. In the first case the text() is the value of the node,the second looks for the specific class definition.

Upvotes: 1

Related Questions