Mike
Mike

Reputation: 5816

Recursive function too slow for creating hierarchical select list

I have a query that gets me a Geo-Structure from the database.

Categories Table (30.000 rows inside):

id  title          parent  type 
-------------------------------
1   germany             0     1
2   bavaria             1     2
3   upper bavaria       2     3
4   munich              3     4
6   italy               0     1
7   toscana             6     2
8   city florence       7     3
9   florence            8     4

Categories Language Table

cid language title
--------------------------
1   en-UK    germany
2   de-DE    deutschland

Objects table:

id  title   landid  regionid    uregionid   cityid
--------------------------------------------------
1   o1           1         2            3        4
2   o2           1         2            3        4 
3   o3           6         7            8        9

MySQL query:

SELECT c.id, c.title, l.title AS translated, c.type, c.parent, count(c.id) as cnt
FROM category c
LEFT JOIN objects o ON (o.landid = c.id  OR o.regionid = c.id OR o.uregionid = c.id OR o.cityid = c.id) 
LEFT JOIN category_lang l ON l.cid = c.id AND l.language = "en-UK"
WHERE c.published = 1 AND o.published = 1
GROUP BY c.id
ORDER BY c.parent

I get an associative array ($tree) with values like here:

Array
(
    [0] => Array
        (
            [id] => 1
            [title] => Germany
            [type] => 1
            [parent] => 0
            [cnt] => 1
        )

    [1] => Array
        (
            [id] => 6
            [title] => Italy
            [type] => 1
            [parent] => 0
            [cnt] => 1
        )

    [2] => Array
        (
            [id] => 2
            [title] => Bavaria
            [type] => 2
            [parent] => 1
            [cnt] => 1
        )

    [3] => Array
        (
            [id] => 7
            [title] => Toscana
            [type] => 2
            [parent] => 6
            [cnt] => 1
        )

    [4] => Array
        (
            [id] => 3
            [title] => Upper Bavaria
            [type] => 3
            [parent] => 2
            [cnt] => 1
        )

    [5] => Array
        (
            [id] => 8
            [title] => City Florence
            [type] => 3
            [parent] => 7
            [cnt] => 1
        )

    [6] => Array
        (
            [id] => 4
            [title] => Munich
            [type] => 4
            [parent] => 3
            [cnt] => 1
        )

    [7] => Array
        (
            [id] => 9
            [title] => Florence
            [type] => 4
            [parent] => 8
            [cnt] => 1
        )

)

Then i create a structure that will prepare the display of a select list:

public static function buildTree($tree, $root = 0) { 
        $return = array(); 
        foreach($tree as $child) { 
            if($child['parent'] == $root) { 
                $return[] = array( 
                    'name' => $child, 
                    'next' => self::buildTree($tree, $child['id']) 
                ); 
            } 
        } 
        return empty($return) ? null : $return;     
    } 

Then i send the structure inside $return to a function to create the final select list:

public static function buildSelect($tree, $s) {
        $option = '';
        if(!is_null($tree) && count($tree) > 0) {
            foreach($tree as $node) {
                $selected = '';
                $class_type = $node['name']['type'];                
                $option .= '<option value="'.$node['name']['id'].'" class="h'.$class_type.'" '.$selected.' data-type="'.$node['name']['type'].'">'
                            .$node['name']['title']. ' (' . $node['name']['cnt'] . ')</option>'
                            . self::buildSelect($node['next'], $s); 
            }
            return $option; 
        }    
    } 

This all works but if the Geo-Structure gets really big the db query gets terrible slow. Would appreciate any ideas about how to speed this up, thanks!

Upvotes: 0

Views: 546

Answers (2)

Josef Kufner
Josef Kufner

Reputation: 2989

Take a look at Nested Sets. It is fairly easy to add calculated tree_left, tree_right, and tree_depth to your current parent-based model — parent stays as primary data, other three will be recalculated on change.

Then, you can use a much simpler select to get items for the whole subtree of categories and thanks to the tree_depth column it is easy to calculate correct indentation in your <select> element or in listings without any recursion.

Upvotes: 1

Kickstart
Kickstart

Reputation: 21533

The basics seem there, and without setting up a lot of test data it is difficult to do any testing.

The following has a couple of minor tweaks (you will need to change to cope with where you want to keep the compare routine). Possibly not enough to fix the issue. However I would be interested to know what effect it has, and also some timings of which method is taking the time, and how it ramps up:-

<?php

function cmp($a, $b)
{
    return strcmp($a["parent"], $b["parent"]);
}

usort($tree, "cmp");

$recursive_tree = self::buildTree($tree);

echo self::buildSelect($recursive_tree, '');

public static function buildTree(&$tree, $root = 0) 
{ 
    $return = array(); 
    foreach($tree as $child) 
    { 
        if($child['parent'] == $root) 
        { 
            $return[] = array( 
                'name' => $child, 
                'next' => self::buildTree($tree, $child['id']) 
            ); 
        }
        else
        {
            if ($child['parent'] > $root)
            {
                break;
            }
        }
    } 
    return empty($return) ? null : $return;     
} 


public static function buildSelect(&$tree, $s) 
{
    $option = '';
    if(!is_null($tree) && count($tree) > 0) 
    {
        foreach($tree as $node) 
        {
            $selected = '';
            $class_type = $node['name']['type'];                
            $option .= '<option value="'.$node['name']['id'].'" class="h'.$class_type.'" '.$selected.' data-type="'.$node['name']['type'].'">'
                        .$node['name']['title']. ' (' . $node['name']['cnt'] . ')</option>'
                        . self::buildSelect($node['next'], $s); 
        }
        return $option; 
    }    
}

Does the output have to be built up, or can be be directly output (even to a temp file)?

EDIT

If your objects table has indexes on landid, regionid, uregionid and cityid (ie, separate indexes on each one) then try this:-

SELECT c.id, c.title, c.type, c.parent, count(c.id) as cnt
FROM category c
LEFT JOIN objects o1 ON o1.landid = c.id AND o1.published = 1
LEFT JOIN objects o2 ON o2.regionid = c.id AND o2.published = 1
LEFT JOIN objects o3 ON o3.uregionid = c.id AND o3.published = 1
LEFT JOIN objects o4 ON o4.cityid = c.id AND o4.published = 1
WHERE c.published = 1 
AND (01.id IS NOT NULL
OR 02.id IS NOT NULL
OR 03.id IS NOT NULL
OR 04.id IS NOT NULL)
GROUP BY c.id, 
        c.title, 
        c.type, 
        c.parent
ORDER BY c.parent

EDIT

Adding in your language table:-

SELECT c.id,
        c.title, 
        l.title AS translated, 
        c.type, 
        c.parent, 
        count(c.id) as cnt
FROM category c
LEFT OUTER JOIN category_lang l ON l.cid = c.id AND l.language = "en-UK"
LEFT OUTER JOIN objects o1 ON o1.landid = c.id AND o1.published = 1
LEFT OUTER JOIN objects o2 ON o2.regionid = c.id AND o2.published = 1
LEFT OUTER JOIN objects o3 ON o3.uregionid = c.id AND o3.published = 1
LEFT OUTER JOIN objects o4 ON o4.cityid = c.id AND o4.published = 1
WHERE c.published = 1 
AND (01.id IS NOT NULL
OR 02.id IS NOT NULL
OR 03.id IS NOT NULL
OR 04.id IS NOT NULL)
GROUP BY c.id, 
        c.title, 
        translated,
        c.type, 
        c.parent
ORDER BY c.parent

As to whether this or the UNION solution is better, that will come down largely to your data. For example this solution will struggle if the 4 id columns in the objects table can have duplicates (unlikely, as I would doubt a city can also be a region).

Upvotes: 1

Related Questions