starnetdev
starnetdev

Reputation: 978

Some beginner questions about PHP SimpleXML and xpath

I am learning PHP SimpleXML and I have some questions. I have been playing to get code from a web in the intranet of my work. I need generic code whenever its possible, since the code could change at any time. In my example I select a div tag and all its children.

...
  <div class="cabTabs">
      <ul>
          <li><a href="/link1">Info1</a></li>
          <li><a href="/link2">Info2</a></li>
          <li><a href="/link3">Info3</a></li>
      </ul>
  </div>
...


//Get all web content:
$b = new sfWebBrowser(); //using symfony 1.4.17 sfWebBrower to get a SimpleXML object.
$b->get('http://intranetwebexample'); //returns a sfWebBrower object.
$xml = $b->getResponseXML(); //returns a SimpleXMLElement

//[Eclipse xdebug Watch - $xml]
"$xml"    SimpleXMLElement     
  @attributes Array [3]   
  head    SimpleXMLElement    
  body    SimpleXMLElement


//Get the div class="cabTabs".
$result = $xml->xpath('//descendant::div[@class="cabTabs"]'); 

//[Eclipse xdebug Watch - $result]
"$result" Array [1]   
  0   SimpleXMLElement    
      @attributes Array [1]   
          class   cabTabs 
      ul  SimpleXMLElement    
          li  Array [6]

Questions:

  1. The use of descendant:: prefix:
    I have read in other stackoverflow topics that descendant:: prefix is not recommended. In order to select a tag, and all its content, what should be the right way to do it? Im using the above code, but dont know if its the right way to do it.

  2. Some questions checking the Eclipse xdebug variable Watch:

2.1 Some times I cant expand the SimpleXML tree more than one or levels. In the example above, I cant access/see the below "li" node, and see its children.
Could it be a limitation of xdebug debugger with SimpleXML objects or maybe a limitation of the Eclipse Watch?
I can perfectly expand/see the "li" node when I access its parent with the usual loop: foreach($ul->li as $li).
However its not a critical bug, I think it would be perfect to see it directly and report it in the proper forum.

2.2 I dont understant at all the result code of the $xml->xpath:
If we take a look at the Eclipse Watch, the "div" tag has been converted to a 0 index key, but the "ul" and "li" tags had their original names, why?

2.3 How to access/loop xpath content with a generic code:
Im using the following Non generic code to access it:

foreach ($result as $record) {        
    foreach($record->ul as $ul) { 
        foreach($ul->li as $li) {
            foreach($li->a as $a) {
                echo ' ' . $a->name;
            }
        }
    }
}

The above code works but only if we write the right tag names. (->ul, ->li, ->a..)
What is the generic way to loop through all its content without having to specify the children name each time? (->ul, ->li, ->a..)
Also I would prefer not having to convert it to an array, unless its the right way.
I have been trying with children() property, but it doesnt work, it stops and crashes in that line: foreach ($result->children() as $ul)

Thank you a lot in advance for taking your time to read my questions. Any help is really welcome :)

System info:
symfony 1.4.17 with sfWebBrowserPlugin, cURL dadapter.
PHP 5.4.0 with cURL support enabled, cURL Information 7.24.0

Upvotes: 0

Views: 613

Answers (3)

starnetdev
starnetdev

Reputation: 978

Worked like a charm hehe.

Here I am adding a complete function which searchs for a substring in all atributes of a node an subnodes recursively, and returns the full string where it has been found.

In my case its perfect to search for some values like href=, and other dinamically generated tag values. Also shows the implementation of what we have talked above. Probably it can be improved and more safe checks can be added.

/* public function bSimpleXMLfindfullstringwithsubstring($node, $sSearchforsubstring, &$sFullstringfound, &$bfoundsubstring)
 * Recursive function to search for the first substring in a list of SimpleXML objects, looking in all its children, in all their attributes.
 * Returns true if the substring has been found.
 * Parameter return:
 *   $sFullstringfound: returns the full string where the substring has been found.
 *   $bfoundsubstring: returns true if the substring has been found.
*/

public function bSimpleXMLfindfullstringwithsubstring($node, $sSearchforsubstring, &$sFullstringfound, &$bfoundsubstring=false)
{
  $bRet = false; 
  if ((isset($node) && ($bfoundsubstring == false)))
  {
      //If the node has attributes
      if ($node->attributes()->count() > 0)
      {
          //Search the string in all the elements of the current SimpleXML object.
          foreach ($node->attributes() AS $name => $attribute)  //[$name = class , (string)$attribute = cabTabs, $attribute = SimpleXML object]
          {
              //(Take care of charset if necessary).
              if (stripos((string)$attribute, $sSearchforsubstring) !== false)
              {
                  //substring found in one of the attributes.
                  $sFullstringfound = (string)$attribute;
                  $bfoundsubstring = true;
                  $bRet = true;
                  break;
              }
          }
      }

      //If the node has childrens (subnodes)
      if (($node->count() > 0) && ($bfoundsubstring == false))
      {
          foreach ($node->children() as $nodechildren)
          {
              if ($bfoundsubstring == false)
              {
                  //Search in the next children.
                  self::bSimpleXMLfindfullstringwithsubstring($nodechildren, $sSearchforsubstring, $sFullstringfound, $bfoundsubstring);
              }
              else
              {
                  break;
              }
          }
      }
  }
  return $bRet;
}

How to call it:

$b = new sfWebBrowser();
$b->get('http://www.example.com/example.html');
$xml = $b->getResponseXMLfixed();     
$result = $xml->xpath('//descendant::div[@class="cabTabs"]'); //example

$sFullString = "";
$bfoundsubstring = false;
foreach ($result as $record)
{
  self::bSimpleXMLfindfullstringwithsubstring($record, "/substring/tosearch", $sFullString, $bfoundsubstring);
}

Upvotes: 0

starnetdev
starnetdev

Reputation: 978

I think now I perfectly understand problem 2.2 and 2.3.

Since its xpath is returning an Array[1], as you explained, and not a SimpleXML object, I cant never use $result->children() because a php array doesnt have the children() property hehe. (Im a bit idiot lol).

The solution is simple, as you have explained, counting the number of elements of the array, loop into the elements and then loop again using the children property, if its a SimpleXML object. Ill add the right code below.

I will also submit the point 1 problem of the Eclipse Watch or xdebug, to their forums in order to guess whats the real problem.

Thank you prodigitalson, very usefull answer :)

Upvotes: 0

prodigitalson
prodigitalson

Reputation: 60413

  1. I dont know I've never used it myself

  2. dont know i usually use Zend Debug - but i dont understand your question anyway... i think you left out some words :-)

2.1 PRobably xdebug/eclipse. Id check preferences theres probably a setting to limit the amount of recursion to help manage memory.

2.2 SimpleXML::xpath Always returns an array of matched Nodes. Thats why you have integer index array as your result. So if you do //someelement you get an array of all someelement tags. You can then access their descendents in the normal fashion like $someelement->itschildelement.

2.3 $result->children() is a good way to get at things in a generic sense. If Xdebug is crashing thats just xdebug. Either turn it off, ignore it, or find a different debugger :-) Xdebug is jsut a tool but shouldnt dictate how you implement things.

Upvotes: 1

Related Questions