peace_love
peace_love

Reputation: 6461

How can I make my function faster (the function is looping through a multidimensional array)?

My function is extremely slowly! Does anyone know what I did wrong or if it is possible to make it faster?

function explodeTree($array, $delimiter = "_", $baseval = false) {
    if(!is_array($array)) return false;
    $splitRE = "/" . preg_quote($delimiter, "/") . "/";
    $returnArr = array();
    foreach ($array as $key => $val) {
                $parts = preg_split($splitRE, $val['path'], -1, PREG_SPLIT_NO_EMPTY);
                $leafPart = array_pop($parts);
                $parentArr = &$returnArr;
                foreach ($parts as $part) {
                    if (!isset($parentArr[$part])) {
                        $parentArr[$part] = array();
                    } elseif (!is_array($parentArr[$part])) {
                        if ($baseval) {
                            $parentArr[$part] = array("__base_val" => $parentArr[$part]);
                        } else {
                            $parentArr[$part] = array();
                        }
                    }
                    $parentArr = &$parentArr[$part];
                }
                if (empty($parentArr[$leafPart])) {
                    $parentArr[$leafPart] = $val;
                } elseif ($baseval && is_array($parentArr[$leafPart])) {
                    $parentArr[$leafPart]["__base_val"] = $val;
                }

    }
    return $returnArr;
}

Array before:

array(4) {
  [0]=>
  array(3) {
    ["path"]=>
    string(30) "Volumes/folder1/horse/fred"
    ["age"]=>
    string(2) "12"
    ["name"]=>
    string(4) "fred"
  }
  [1]=>
  array(3) {
    ["path"]=>
    string(28) "Volumes/folder1/cat/john"
    ["age"]=>
    string(2) "10"
    ["name"]=>
    string(4) "john"
  }
  [2]=>
  array(3) {
    ["path"]=>
    string(27) "Volumes/folder2/cat/sam"
    ["age"]=>
    string(2) "11"
    ["name"]=>
    string(3) "sam"
  }
  [3]=>
  array(3) {
    ["path"]=>
    string(32) "Volumes/folder2/cat/cat/john"
    ["age"]=>
    string(2) "16"
    ["name"]=>
    string(4) "john"
  }
}

Array after using function:

array(1) {
  ["Volumes"]=>
  array(2) {
    ["folder1"]=>
    array(2) {
      ["horse"]=>
      array(1) {
        ["fred"]=>
        array(3) {
          ["path"]=>
          string(30) "Volumes/folder1/horse/fred"
          ["age"]=>
          string(2) "12"
          ["name"]=>
          string(4) "fred"
        }
      }
      ["cat"]=>
      array(1) {
        ["john"]=>
        array(3) {
          ["path"]=>
          string(28) "Volumes/folder1/cat/john"
          ["age"]=>
          string(2) "10"
          ["name"]=>
          string(4) "john"
        }
      }
    }
    ["folder2"]=>
    array(1) {
      ["cat"]=>
      array(2) {
        ["sam"]=>
        array(3) {
          ["path"]=>
          string(27) "Volumes/folder2/cat/sam"
          ["age"]=>
          string(2) "11"
          ["name"]=>
          string(3) "sam"
        }
        ["cat"]=>
        array(1) {
          ["john"]=>
          array(3) {
            ["path"]=>
            string(32) "Volumes/folder2/cat/cat/john"
            ["age"]=>
            string(2) "16"
            ["name"]=>
            string(4) "john"
          }
        }
      }
    }
  }
}

Upvotes: 1

Views: 52

Answers (2)

axiac
axiac

Reputation: 72176

It seems your code produces the same output, no matter if $baseval is TRUE or FALSE. The following code produces the same output, runs fast and gracefully ignores the value of $baseval too:

function explodeTree(array $array, $delimiter = "_", $baseval = false)
{
    # Build the output here
    $returnArr = array();

    foreach ($array as $item) {
        # Split the path using the delimiter, drop the empty segments
        $pieces = array_filter(explode($delimiter, $item['path']));
        # Turn the path into a nested array
        # Each component of the path is the only key on its level
        # Build it from the leaf up to the root
        $a = array_reduce(
            array_reverse($pieces),        # Start from the leaf
            function (array $carry, $piece) {     # Create parent node...
                return array($piece => $carry);   # ... use the path piece as key
            },
            $item                          # Put the item itself as leaf
        );

        # Combine the new path (nested arrays) into the existing tree
        # array_merge_recursive() takes care of all the levels
        $returnArr = array_merge_recursive($returnArr, $a);
    }

    # That's all
    return $returnArr;
}

Adding $baseval

I think the purpose of $baseval is to put the original properties of an item into a new entry under the key __base_val if a subsequent path adds children to a leaf node. For example, if the last entry has 'Volumes/folder2/cat/sam/john' as path then the output of the current code ends with:

["folder2"]=>
array(1) {
  ["cat"]=>
  array(1) {
    ["sam"]=>
    array(4) {
      ["path"]=>
      string(23) "Volumes/folder2/cat/sam"
      ["age"]=>
      string(2) "11"
      ["name"]=>
      string(3) "sam"
      ["john"]=>
      array(3) {
        ["path"]=>
        string(28) "Volumes/folder2/cat/sam/john"
        ["age"]=>
        string(2) "16"
        ["name"]=>
        string(4) "john"
      }
    }
  }
}

but the expected output should be (I think):

["folder2"]=>
array(1) {
  ["cat"]=>
  array(1) {
    ["sam"]=>
    array(2) {
      ["__base_val"]=>
      array(3) {
        ["path"]=>
        string(23) "Volumes/folder2/cat/sam"
        ["age"]=>
        string(2) "11"
        ["name"]=>
        string(3) "sam"
      }
      ["john"]=>
      array(3) {
        ["path"]=>
        string(28) "Volumes/folder2/cat/sam/john"
        ["age"]=>
        string(2) "16"
        ["name"]=>
        string(4) "john"
      }
    }
  }
}

The above function cannot be modified to produce this output. array_merge_recursive() doesn't change the structure of the arrays it is provided to merge; it just combines them.

A complete rethink and rewrite of the function is needed:

function explodeTree(array $array, $delimiter = "_", $baseval = false)
{
   # Build the output here
   $returnArr = array();

   foreach ($array as $item) {
      # Split the path using the delimiter, drop the empty segments
      $pieces = array_filter(explode($delimiter, $item['path']));
      # Keep a reference to the current node; start from the root of the tree we build
      $pos = &$returnArr;
      foreach ($pieces as $piece) {
        if (! array_key_exists($piece, $pos)) {
            # The path component doesn't exist in the tree; add it
            $pos[$piece] = array();
        } elseif ($baseval && array_key_exists('path', $pos[$piece])) {
            # The component exists, it is a leaf node (has 'path' property) and $baseval is TRUE
            # Save the existing node content
            $val = $pos[$piece];
            # Replace it with a new level; store the old leaf in '__base_val'
            $pos[$piece] = array('__base_val' => $val);
        }
        # Advance to the next level
        $pos = &$pos[$piece];
      }

      # If $baseval is TRUE, make sure we don't mix leaf nodes with inner nodes
      if ($baseval && ! empty($pos)) {
         # The node already has children; put the item in '__base_val'
         $pos['__base_val'] = $item;
      } else {
         # The node was just added; store $item in it
        $pos = array_merge($pos, $item);
      }

      unset($pos);
   }

   return $returnArr;
}

Upvotes: 2

dland
dland

Reputation: 4419

You might be running into huge amounts of memory reallocations. If you add a $begin = microtime(true); at the top of the outer foreach, and then printf('%0.6f', microtime(true) - $begin); at the end of the loop, do you see the times staying at a roughly stable level or does it get slower as the execution advances?

It looks like you're building a trie. In which case https://github.com/ikcede/PHP-Trie may be of use.

Upvotes: 0

Related Questions