Reputation: 180
I am new to PHP. I want to write code to find the id
specified in the html code below, which is 1123
. Can any one give me some idea?
<span class="miniprofile-container /companies/1123?miniprofile="
data-tracking="NUS_CMPY_FOL-nhre"
data-li-getjs="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=dyt8o4nwtaujeutlgncuqe0dn&fc=2">
<strong>
<a href="http://www.linkedin.com/nus-trk?trkact=viewCompanyProfile&pk=biz-overview-public&pp=1&poster=&uid=5674666402166894592&ut=NUS_UNIU_FOLLOW_CMPY&r=&f=0&url=http%3A%2F%2Fwww%2Elinkedin%2Ecom%2Fcompany%2F1123%3Ftrk%3DNUS_CMPY_FOL-nhre&urlhash=7qbc">
Bank of America
</a>
</strong>
</span> has a new Project Manager
Note: I don't need the content in the span class. I need the id
in the span class name.
I tried the following:
$dom = new DOMDocument('1.0', 'UTF-8');
@$dom->loadHTML($html);
$xmlElements = simplexml_import_dom($dom);
$id = $xmlElements->xpath("//span [@class='miniprofile-container /companies/$data_id?miniprofile=']");
... but I don't know how to proceed further.
Upvotes: 1
Views: 383
Reputation: 22783
This should do what you are after:
$dom = new DOMDocument('1.0', 'UTF-8');
@$dom->loadHTML( $html );
$xpath = new DOMXPath( $dom );
/*
* the following xpath query will find all class attributes of span elements
* whose class attribute contain the strings " miniprofile-container " and " /companies/"
*/
$nodes = $xpath->query( "//span[contains(concat(' ', @class, ' '), ' miniprofile-container ') and contains(concat(' ', @class, ' '), ' /companies/')]/@class" );
foreach( $nodes as $node )
{
// extract the number found between "/companies/" and "?miniprofile" in the node's nodeValue
preg_match( '#/companies/(\d+)\?miniprofile#', $node->nodeValue, $matches );
var_dump( $matches[ 1 ] );
}
Upvotes: 1
Reputation: 891
dependent of your need, you could do
$matches = array();
preg_match('|<span class="miniprofile-container /companies/(\d+)\?miniprofile|', $html, $matches);
print_r($matches);
this is a very trivial regex, but could serve as a first suggestion. If you want to go via DomDocument or simplexml, you mustn't mix both like you did in your example. What is your preferred way, we can narrow this down then.
//edit: pretty much what @fireeyedboy said, but this is what I just fiddled together:
<?php
$html = <<<EOD
<html><head></head>
<body>
<span class="miniprofile-container /companies/1123?miniprofile="
data-tracking="NUS_CMPY_FOL-nhre"
data-li-getjs="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=dyt8o4nwtaujeutlgncuqe0dn&fc=2">
<strong>
<a href="#">
Bank of America
</a>
</strong>
</span> has a new Project Manager
</body>
</html>
EOD;
$domDocument = new DOMDocument('1.0', 'UTF-8');
$domDocument->recover = TRUE;
$domDocument->loadHTML($html);
$xPath = new DOMXPath($domDocument);
$relevantElements = $xPath->query('//span[contains(@class, "miniprofile-container")]');
$foundId = NULL;
foreach($relevantElements as $match) {
$pregMatches = array();
if (preg_match('|/companies/(\d+)\?miniprofile|', $match->getAttribute('class'), $pregMatches)) {
if (isset($pregMatches[1])) {
$foundId = $pregMatches[1];
break;
}
};
}
echo $foundId;
?>
Upvotes: 1