Piotr Chabros
Piotr Chabros

Reputation: 475

Efficiency in SQL query


I have created a search functionality to my cake application. It is built from multiple select boxes in which you can select data, then it cycles through the selected options and implements them to the SQL syntax.

Basically this is how the function looks like:

$selectedFilters = $this->data;
        $selectSQL =    'SELECT
                        agencies.agency, agencies.website_url, agencies.status, agencies.size, agencies.id, OfficeData.id, ContactData.name, ContactData.surname, ContactData.job_title, ContactData.email, 
                        ContactData.mobile, OfficeCountryData.country
                        FROM agencies
                        LEFT JOIN (SELECT agencies_industries.agency_id, agencies_industries.industry_id FROM agencies_industries) AS IndustryData ON agencies.id = IndustryData.agency_id
                        LEFT JOIN (SELECT agencies_professions.agency_id, agencies_professions.profession_id FROM agencies_professions) AS ProfessionData ON agencies.id = ProfessionData.agency_id
                        LEFT JOIN (SELECT agencies_sectors.agency_id, agencies_sectors.sector_id FROM agencies_sectors) AS SectorData ON agencies.id = SectorData.agency_id
                        LEFT JOIN (SELECT agencies_seniorities.agency_id, agencies_seniorities.seniority_id FROM agencies_seniorities) AS SeniorityData ON agencies.id = SeniorityData.agency_id
                        LEFT JOIN (SELECT agencies_zones.agency_id, agencies_zones.zone_id FROM agencies_zones) AS ZonesData ON agencies.id = ZonesData.agency_id
                        LEFT JOIN (SELECT agencies_countries.agency_id, agencies_countries.country_id FROM agencies_countries) AS CountryData ON agencies.id = CountryData.agency_id
                        LEFT JOIN (SELECT agencies_regions.agency_id, agencies_regions.region_id FROM agencies_regions) AS RegionData ON agencies.id = RegionData.agency_id
                        LEFT JOIN (SELECT agencies_cities.agency_id, agencies_cities.city_id FROM agencies_cities) AS CityData ON agencies.id = CityData.agency_id
                        LEFT JOIN (SELECT agencies_specialisms.agency_id, agencies_specialisms.specialism_id FROM agencies_specialisms) AS SpecialismData ON agencies.id = SpecialismData.agency_id
                        LEFT JOIN (SELECT offices.id, offices.agency_id, offices.hq FROM offices WHERE offices.hq = "1") AS OfficeData ON agencies.id = OfficeData.agency_id
                        LEFT JOIN (SELECT countries.id, countries.country FROM countries) AS OfficeCountryData ON OfficeData.hq = OfficeCountryData.id
                        LEFT JOIN (SELECT contacts.name, contacts.surname, contacts.agency_id, contacts.job_title, contacts.email, contacts.mobile FROM contacts) AS ContactData ON agencies.id = ContactData.agency_id
                        ';
        $whereSQL = ' WHERE 1 = 1 ';
            foreach($selectedFilters as $key)
                foreach($key as $name=>$value){
                    if(is_array($key))
                        foreach($key as $key=>$value){
                            $i = 0;
                            $connector = 'AND';
                            if(is_array($value)){
                                foreach($value as $value){
                                    if($i > 0)
                                        $connector = 'OR';
                                    $i++;
                                    switch($key){
                                        case 'Profession': $whereSQL .= $connector.' ProfessionData.profession_id = ' . $value . ' ';
                                        break;
                                        case 'Specialism': $whereSQL .= $connector.' SpecialismData.specialism_id = ' . $value . ' ';
                                        break;
                                        case 'SubSpecialism': $whereSQL .= ''; //$whereSQL .= $connector.' SubData.sub_specialism_id = ' . $value . ' ';
                                        break;
                                        case 'Seniority': $whereSQL .= $connector.' SeniorityData.seniority_id = ' . $value . ' ';
                                        break;
                                        case 'Industry': $whereSQL .= $connector.' IndustryData.industry_id = ' . $value . ' ';
                                        break;
                                        case 'Zone': $whereSQL .= $connector.' ZonesData.zone_id = ' . $value . ' ';
                                        break;
                                        case 'Country': $whereSQL .= $connector.' CountryData.country_id = ' . $value . ' ';
                                        break;
                                        case 'Region': $whereSQL .= $connector.' RegionData.region_id = ' . $value . ' ';
                                        break;
                                        case 'City': $whereSQL .= $connector.' CityData.city_id = ' . $value . ' ';
                                        break;
                                        case 'Sector': $whereSQL .= $connector.' SectorData.sector_id = ' . $value . ' ';
                                        break;
                                        case 'status': $whereSQL .= $connector.' agencies.status = "' . $value . '" ';
                                        break;
                                        case 'size': $whereSQL .= $connector.' agencies.size = "' . $value . '" ';
                                        break;
                                    }
                                }
                            }
                            else
                                if(!isBlank($value) && $key != 'Search')
                                    $whereSQL .= $connector.' agencies.'.$key.' = "'.$value.'" ';
                        }
                }
        $groupBySQL = 'GROUP BY agencies.id ORDER BY agencies.id ASC';
        $resultAgencies = $this->Agency->query($selectSQL . $whereSQL . $groupBySQL);
        $this->set(compact('resultAgencies'));

The problem I encountered with my search is that it works really slow. This happens because of using too many LEFT JOIN commands. Every LEFT JOIN selects data from distinct tables and gathers them all creating another table. Then the data is displayed.

I need someone to give me a hint how to do this not using so many LEFT JOINs.

Cheers.

Upvotes: 0

Views: 148

Answers (6)

Learn to use MySQL's EXPLAIN syntax. Edit your question, and include the output of your EXPLAIN plan.

Among other problems, your doing left joins on a lot of tables that you're not selecting from. Try this.

SELECT agencies.agency, agencies.website_url, agencies.status, agencies.size, 
       agencies.id, 
       OfficeData.id, 
       ContactData.name, ContactData.surname, ContactData.job_title, 
       ContactData.email, ContactData.mobile, OfficeCountryData.country
FROM agencies
LEFT JOIN (SELECT offices.id, offices.agency_id, offices.hq 
           FROM offices 
           WHERE offices.hq = "1") AS OfficeData 
       ON agencies.id = OfficeData.agency_id
LEFT JOIN countries AS OfficeCountryData 
       ON OfficeData.hq = OfficeCountryData.id
LEFT JOIN contacts AS ContactData 
       ON agencies.id = ContactData.agency_id

How does that affect performance?

There's probably no compelling reason to identify countries, cities, and regions with ID numbers; they carry their identity around in their name. Test replacing the id number with the proper names. (ID numbers always require a join operation to get useful data back; natural keys often eliminate joins.)

You've commented that performance without unneeded joins is good, and that the switch statement is not to blame. If that's true, then you need to reduce the number of joins. Fortunately, reducing joins is both simple and straightforward.

If you must "report the universe", you might try splitting the query up and submitting multiple queries asychronously. Return and display, say, agency and contact data first, and you will greatly increase the apparent speed of the application. And the dbms can be processing the second query while the first is rendering. Often apparent speed is more important than actual speed.

Upvotes: 1

Nin
Nin

Reputation: 3020

Try this one:

$selectSQL =    'SELECT
                        agencies.agency, agencies.website_url, agencies.status, agencies.size, agencies.id, OfficeData.id, ContactData.name, ContactData.surname, ContactData.job_title, ContactData.email, 
                        ContactData.mobile, OfficeCountryData.country
                        FROM agencies
                        LEFT JOIN agencies_industries AS IndustryData ON agencies.id = IndustryData.agency_id
                        LEFT JOIN  agencies_professions AS ProfessionData ON agencies.id = ProfessionData.agency_id
                        LEFT JOIN  agencies_sectors AS SectorData ON agencies.id = SectorData.agency_id
                        LEFT JOIN  agencies_seniorities AS SeniorityData ON agencies.id = SeniorityData.agency_id
                        LEFT JOIN  agencies_zones AS ZonesData ON agencies.id = ZonesData.agency_id
                        LEFT JOIN agencies_countries AS CountryData ON agencies.id = CountryData.agency_id
                        LEFT JOIN  agencies_regions AS RegionData ON agencies.id = RegionData.agency_id
                        LEFT JOIN  agencies_cities AS CityData ON agencies.id = CityData.agency_id
                        LEFT JOIN  agencies_specialism AS SpecialismData ON agencies.id = SpecialismData.agency_id
                        LEFT JOIN  offices  AS OfficeData ON (agencies.id = OfficeData.agency_id AND OfficeData.hq = "1")
                        LEFT JOIN countries AS OfficeCountryData ON OfficeData.hq = OfficeCountryData.id
                        LEFT JOIN  contacts AS ContactData ON agencies.id = ContactData.agency_id
                        ';

But even then it might be slow since you join too many tables. But it's hard to tell without knowing anything about your data and the amount of rows that you'll return. You might want to move some JOINS to a subquery (like country) if you only return a few rows. Or you can add that information in a seperate query.

Edit: Without knowing your data and db-structure it's hard to tell. There are a lot of things that influence the speed of your query. First rewrite your query so that tables that are not used for your selection (i.e. the WHERE) or fields you want to show are not used in your query. So if you make no selection (emtpy $selectedFilters) the you don't have to include the industries, professions, sectors, seniorities, etc tables.:

$selectedFilters = $this->data;
        $selectSQL =    'SELECT
                        agencies.agency, agencies.website_url, agencies.status, agencies.size, agencies.id, OfficeData.id, ContactData.name, ContactData.surname, ContactData.job_title, ContactData.email, 
                        ContactData.mobile, OfficeCountryData.country
                        FROM agencies';


        $sql2='                LEFT JOIN  offices  AS OfficeData ON (agencies.id = OfficeData.agency_id AND OfficeData.hq = "1")
                        LEFT JOIN countries AS OfficeCountryData ON OfficeData.hq = OfficeCountryData.id
                        LEFT JOIN  contacts AS ContactData ON agencies.id = ContactData.agency_id
                        ';

        $whereSQL = ' WHERE 1 = 1 ';
            foreach($selectedFilters as $key)
                foreach($key as $name=>$value){
                    if(is_array($key))
                        foreach($key as $key=>$value){
                            $i = 0;
                            $connector = 'AND';
                            if(is_array($value)){
                                foreach($value as $value){
                                    if($i > 0)
                                        $connector = 'OR';
                                    $i++;
                                    switch($key){
                                        case 'Profession': $whereSQL .= $connector.' ProfessionData.profession_id = ' . $value . ' ';
                                        $sql2.=' LEFT JOIN  agencies_professions AS ProfessionData ON agencies.id = ProfessionData.agency_id ';
                                        break;
                                        case 'Specialism': $whereSQL .= $connector.' SpecialismData.specialism_id = ' . $value . ' ';
                                        $sql2.=' LEFT JOIN  agencies_specialism AS SpecialismData ON agencies.id = SpecialismData.agency_id ';
                                        break;
                                        case 'SubSpecialism': $whereSQL .= ''; //$whereSQL .= $connector.' SubData.sub_specialism_id = ' . $value . ' ';
                                        break;
                                        case 'Seniority': $whereSQL .= $connector.' SeniorityData.seniority_id = ' . $value . ' ';
                                        $sql2.=' LEFT JOIN  agencies_seniorities AS SeniorityData ON agencies.id = SeniorityData.agency_id ';
                                        break;
                                        case 'Industry': $whereSQL .= $connector.' IndustryData.industry_id = ' . $value . ' ';
                                        $sql2=' LEFT JOIN agencies_industries AS IndustryData ON agencies.id = IndustryData.agency_id ';
                                        break;
                                        case 'Zone': $whereSQL .= $connector.' ZonesData.zone_id = ' . $value . ' ';
                                        $sql2.=' LEFT JOIN  agencies_zones AS ZonesData ON agencies.id = ZonesData.agency_id ';
                                        break;
                                        case 'Country': $whereSQL .= $connector.' CountryData.country_id = ' . $value . ' ';
                                        $sql2.=' LEFT JOIN agencies_countries AS CountryData ON agencies.id = CountryData.agency_id ';
                                        break;
                                        case 'Region': $whereSQL .= $connector.' RegionData.region_id = ' . $value . ' ';
                                        $sql2.=' LEFT JOIN  agencies_regions AS RegionData ON agencies.id = RegionData.agency_id ';
                                        break;
                                        case 'City': $whereSQL .= $connector.' CityData.city_id = ' . $value . ' ';
                                        $sql2.=' LEFT JOIN  agencies_cities AS CityData ON agencies.id = CityData.agency_id ';
                                        break;
                                        case 'Sector': $whereSQL .= $connector.' SectorData.sector_id = ' . $value . ' ';
                                        $sql2.='LEFT JOIN  agencies_sectors AS SectorData ON agencies.id = SectorData.agency_id ';
                                        break;
                                        case 'status': $whereSQL .= $connector.' agencies.status = "' . $value . '" ';
                                        break;
                                        case 'size': $whereSQL .= $connector.' agencies.size = "' . $value . '" ';
                                        break;
                                    }
                                }
                            }
                            else
                                if(!isBlank($value) && $key != 'Search')
                                    $whereSQL .= $connector.' agencies.'.$key.' = "'.$value.'" ';
                        }
                }
        $groupBySQL = 'GROUP BY agencies.id ORDER BY agencies.id ASC';
        $resultAgencies = $this->Agency->query($selectSQL . $sql2 . $whereSQL . $groupBySQL);
        $this->set(compact('resultAgencies'));

Second take a good look at your indexes for each table. Make sure you have an index on the fields you use in the JOINS.

Third, look at the field types you use. Don't use a INT if a SMALLINT is large enough.

Finaly: Normalization is great, but sometimes it's better to combine some things, even if that means you have duplicate data.

Upvotes: 4

Hugh
Hugh

Reputation: 1459

You should use joins and not subqueries. You also might not always need all those left joins; I can see your WHERE statement is dynamic, so as part of your switch statement, you could decide which extra tables you need to join in to.

So start by joining in only the tables you need columns from;

$selectSQL = "
    SELECT agencies.agency, 
       agencies.website_url, 
       agencies.status, 
       agencies.size, 
       agencies.id, 
       OfficeData.id, 
       ContactData.name, 
       ContactData.surname, 
       ContactData.job_title, 
       ContactData.email,               
       ContactData.mobile, 
       OfficeCountryData.country
  FROM agencies
  LEFT JOIN offices AS OfficeData          ON ( agencies.id = OfficeData.agency_id )
  LEFT JOIN contacts AS ContactData        ON ( agencies.id = ContactData.agency_id )
  LEFT JOIN countries AS OfficeCountryData ON ( OfficeData.hq = OfficeCountryData.id ) "

And then as you build your where statements, you can evaluate if you need to join in a table for that clause to be valid.

$whereSQL = 'WHERE OfficeData.hq = "1"';
$joinSQL ='';

# Loop though your filter options and build up the where and joins
foreach(...){
    switch($key){
        case 'Profession': 
            $whereSQL .= $connector.' ProfessionData.profession_id = ' . $value . ' ';
            $joinSQL .= 'LEFT JOIN agencies_professions AS ProfessionData ON (agencies.id = ProfessionData.agency_id)'
        break;
        ....
    }
}

And then build your final query

$sql = $selectSQL.' '.$joinSQL.' '.$whereSQL;

Upvotes: 1

Kris
Kris

Reputation: 912

Joins can be slow, but that's not your problem here. One quick fix: remove those subqueries, why make a subquery instead of the whole table? It makes everything a lot slower.

Second: make sure all keys you use to join are marked as indexes, it can make everything a lot faster.

Upvotes: 0

Obi Ik
Obi Ik

Reputation: 159

Joins are slow, try subqueries, write shorter queries. they might mean more codes but if u collect your timestamp before and after the queries, u'll see the huge difference.

Upvotes: 0

matino
matino

Reputation: 17715

Without knowing what you are actually doing it's hard to tell whether your query can be simplified. Assuming you need information from all tables and all id's are primary keys I'd analyze WHERE clause - do you have indexes properly defined? With large database indexes make huge difference and can greatly improve performance.

Upvotes: 0

Related Questions