raza19
raza19

Reputation: 23

mysql:choosing the most efficient query from the two

Both of these mysql queries produce exactly the same result but query A is a simple union and it has the where postType clause embedded inside individual queries whereas query B has the same where clause applied to the external select of the virtual table which is a union of individual query results. I am concerned that the virtual table sigma from query B might get too large for no good reason if there are a lot of rows but then I am bit confused because how would the order by work for query A ; would it also not have to make a virtual table or something like that for sorting results. All may depend on how order by works for a union ? If order by for a union is also making a temp table ; would then query A almost equate to query B in resources(it will be much easier for us to implement query B in our system compared to query A)? Please guide/advise in any way possible, thanks

Query A

SELECT `t1`.*, `t2`.*
            FROM `t1` INNER JOIN `t2` ON
            `t1`.websiteID= `t2`.ownerID
            AND `t1`.authorID= `t2`.authorID
            AND `t1`.authorID=1559 AND `t1`.postType="simplePost"
UNION
            SELECT `t1`.*
             FROM `t1` where websiteID=1559 AND postType="simplePost" 
ORDER BY postID limit 0,50

Query B

Select * from (
SELECT `t1`.*,`t2`.*
            FROM `t1` INNER JOIN `t2` ON
             `t1`.websiteID= `t2`.ownerID
              AND `t1`.authorID= `t2`.authorID
            AND `t1`.authorID=1559
UNION
            SELECT `t1`.*
             FROM `t1` where websiteID=1559
)
As sigma  where postType="simplePost" ORDER BY postID limit 0,50

EXPLAIN FOR QUERY A

id    type                table           type    possible_keys   keys            key_len ref         rows    Extra
1     PRIMARY             t2              ref     userID          userID          4       const       1       
1     PRIMARY             t1              ref     authorID        authorID        4       const       2       Usingwhere
2     UNION               t1              ref     websiteID       websiteID       4       const       9       Usingwhere
NULL  UNIONRESULT         <union1,2>      ALL     NULL            NULL            NULL    NULL        NULL    Usingfilesort

EXPLAIN FOR QUERY B

id    select_type     table       type    possible_keys   key         key_len     ref     rows    Extra
1     PRIMARY         <derived2>  ALL     NULL            NULL        NULL        NULL    10      Using where; Using filesort
2     DERIVED         t2          ref     userID          userID          4               1   
2     DERIVED         t1          ref     authorID        authorID        4               2       Using where
3     UNION           t1          ref     websiteID       websiteID       4               9   
NULL  UNION RESULT    <union2,3>  ALL     NULL            NULL        NULL        NULL    NULL

Upvotes: 1

Views: 116

Answers (2)

Bohemian
Bohemian

Reputation: 425258

There is no doubt that version 1 - separate where clauses in each side of the union - will be faster. Let's look at why version - where clause over the union result - is worse:

  • data volume: there's always going to be more rows in the union result, because there are less conditions on what rows are returned. This means more disk I/O (depending on indexes), more temporary storage to hold the rowset, which means more processing time
  • repeated scan: the entire result of the union must be scanned again to apply the condition, when it could have been handled during the initial scan. This means double handling the rowset, albeit probably in-memory, still it's extra work.
  • indexes aren't used for where clauses on a union result. If you have an index over the foreign key fields and postType, it would not be used

If you want maximum performance, use UNION ALL, which passes the rows straight out into the result with no overhead, instead of UNION, which removes duplicates (usually by sorting) and can be expensive and is unnecessary based in your comments

Define these indexes and use version 1 for maximum performance:

create index t1_authorID_postType on t1(authorID, postType);
create index t1_websiteID_postType on t1(websiteID, postType);

Upvotes: 2

Paul Maxwell
Paul Maxwell

Reputation: 35603

perhaps this would work in lieu:

SELECT
     `t1`.*
    ,`t2`.*
FROM `t1`
LEFT JOIN `t2` ON `t1`.websiteID = `t2`.ownerID
    AND `t1`.authorID = `t2`.authorID
    AND `t1`.authorID = 1559
WHERE ( `t1`.authorID = 1559 OR `t1`.websiteID = 1559 )
    AND `t1`.postType = 'simplePost'
ORDER BY postID limit 0 ,50

Upvotes: 0

Related Questions