Reputation: 1938
This is a follow-up to my question yesterday - Recursive UL LI to PHP multi-dimensional array - I've almost managed to convert the HTML block to an array, though there is a slight problem that I cannot fix. When processing the HTML block below, the output array does not quite follow what has been inputted (and I cannot see where I'm going wrong and need a fresh pair of eyes!!).
I've included the following items:
HTML Block
Basically takes the form of:
-A
-B
-C
----
-D
-E
-F
----
-G
-H
-I
As follows:
<li>
<ul>
<li>A</li>
<li>
<ul>
<li>B</li>
<li>
<ul>
<li>C</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>
<ul>
<li>D</li>
<li>
<ul>
<li>E</li>
<li>
<ul>
<li>F</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>
<ul>
<li>G</li>
<li>
<ul>
<li>H</li>
<li>
<ul>
<li>I</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
PHP Function and Processing
function process_ul($output_data, $data, $key, $level_data, $level_key){
if(substr($data[$key], 0, 3) == '<ul'){
// going down a level in the tree
$level_key++;
// check to see if the level key exists within the level data, else create it and set to zero
if(!is_numeric($level_data[$level_key])){
$level_data[$level_key] = 0;
}
// increment the key to look at the next line
$key++;
if(substr($data[$key], 0, 4) !== '</ul'){
while(substr($data[$key], 0, 4) !== '</ul'){
// whilst we don't have an end of list, do some recursion and keep processing the array
$returnables = process_ul($output_data, $data, $key, $level_data, $level_key);
$output_data = $returnables['output'];
$data = $returnables['data'];
$key = $returnables['key'];
$level_data = $returnables['level_data'];
$level_key = $returnables['level_key'];
}
}
}
if(substr($data[$key], 0, 4) !== '</ul' && $data[$key] !== "<li>" && $data[$key] !== "</li>"){
// we don't want to be saving lines with no data or the ends of a list
// get the array key value so we know where to save it in our array (basically so we can't overwrite anything that may already exist
$this_key = &$output_data;
for($build_key=0;$build_key<($level_key+1); $build_key++){
$this_key =& $this_key[$level_data[$build_key]];
}
if(is_array($this_key)){
// look at the next key, find the next open one
$this_key[(array_pop(array_keys($this_key))+1)] = $data[$key];
} else {
// a new entry, so nothing to worry about
$this_key = $data[$key];
}
$level_data[$level_key]++;
} else if(substr($data[$key], 0, 4) == '</ul'){
// going up a level in the tree
$level_key--;
}
// increment the key to look at the next line when we loop in a moment
$key++;
// prepare the data to be returned
$return_me = array();
$return_me['output'] = $output_data;
$return_me['data'] = $data;
$return_me['key'] = $key;
$return_me['level_data'] = $level_data;
$return_me['level_key'] = $level_key;
// return the data
return $return_me;
}
// explode the data coming in by looking at the new lines
$input_array = explode("\n", $html_ul_tree_in);
// get rid of any empty lines - we don't like those
foreach($input_array as $key => $value){
if(trim($value) !== ""){
$input_data[] = trim($value);
}
}
// set the array and the starting level
$levels = array();
$levels[0] = 0;
$this_level = 0;
// loop around the data and process it
for($i=0; $i<count($input_data); $i){
$returnables = process_ul($output_data, $input_data, $i, $levels, $this_level);
$output_data = $returnables['output'];
$input_data = $returnables['data'];
$i = $returnables['key'];
$levels = $returnables['level_data'];
$this_level = $returnables['level_key'];
}
// let's see how we did
print_r($output_data);
Output
Note that D is in the wrong position, should be in position [0][2] - not [0][1][2], and every other position after D is out by 1 place (I'm sure you can tell by looking).
Basically takes the form of:
-A
-B
-C
-D
----
-E
-F
-G
----
-H
-I
As follows:
Array
(
[0] => Array
(
[0] => <li>A</li>
[1] => Array
(
[0] => <li>B</li>
[1] => Array
(
[0] => <li>C</li>
)
[2] => <li>D</li>
)
[2] => Array
(
[1] => <li>E</li>
[2] => Array
(
[1] => <li>F</li>
)
[3] => <li>G</li>
)
[3] => Array
(
[2] => <li>H</li>
[3] => Array
(
[2] => <li>I</li>
)
)
)
)
Thanks for your time - any assistance in outputting the array correctly would be greatly appreciated!
Upvotes: 3
Views: 4185
Reputation: 29932
Here is an working example for parsing the HTML, and turn it into an array, using DOMDocument and the domNodeToArray()-function provided here: http://www.ermshaus.org/2010/12/php-transform-domnode-to-array
The HTML didn't need to be well-formed.
// $inputHTML is your HTML-list as a string
// this is necessary to prevent DOMDocument errors on HTML5-elements
libxml_use_internal_errors(true);
$dom = new DOMDocument();
// UTF-8 hack, to correctly handle UTF-8 through DOMDocument
$dom->loadHTML('<?xml encoding="UTF-8">' . $inputHTML);
// get the first list-element in the HTML-document
$listAsDom = $dom->getElementsByTagName('ul')->item(0);
// print it out as array
var_dump(domNodeToArray($listAsDom));
/**
* Transforms the contents of a DOMNode to an associative array
* @author Marc Ermshaus
* http://www.ermshaus.org/2010/12/php-transform-domnode-to-array
*
* @param DOMNode $node DOMDocument node
* @return mixed Associative array or string with node content
*/
function domNodeToArray(DOMNode $node) {
$ret = '';
if ($node->hasChildNodes()) {
if ($node->firstChild === $node->lastChild
&& $node->firstChild->nodeType === XML_TEXT_NODE
) {
// Node contains nothing but a text node, return its value
$ret = trim($node->nodeValue);
} else {
// Otherwise, do recursion
$ret = array();
foreach ($node->childNodes as $child) {
if ($child->nodeType !== XML_TEXT_NODE) {
// If there's more than one node with this node name on the
// current level, create an array
if (isset($ret[$child->nodeName])) {
if (!is_array($ret[$child->nodeName])
|| !isset($ret[$child->nodeName][0])
) {
$tmp = $ret[$child->nodeName];
$ret[$child->nodeName] = array();
$ret[$child->nodeName][] = $tmp;
}
$ret[$child->nodeName][] = domNodeToArray($child);
} else {
$ret[$child->nodeName] = domNodeToArray($child);
}
}
}
}
}
return $ret;
}
Upvotes: 1
Reputation: 88647
IF your lists are always well formed, you could use this to do what you want. It uses SimpleXML so it will not be forgiving of mistakes and bad form in the input code. If you want to be forgiving, you will need to use DOM - the code will be a little more complex, but not ridiculously so.
function ul_to_array ($ul) {
if (is_string($ul)) {
if (!$ul = simplexml_load_string("<ul>$ul</ul>")) {
trigger_error("Syntax error in UL/LI structure");
return FALSE;
}
return ul_to_array($ul);
} else if (is_object($ul)) {
$output = array();
foreach ($ul->li as $li) {
$output[] = (isset($li->ul)) ? ul_to_array($li->ul) : (string) $li;
}
return $output;
} else return FALSE;
}
It takes data in the exact form as provided in the question - with no outer enclosing <ul>
tags. If you want to pass the outer <ul>
tags as part of the input string, just change
if (!$ul = simplexml_load_string("<ul>$ul</ul>")) {
to
if (!$ul = simplexml_load_string($ul)) {
Upvotes: 3