YogeshR
YogeshR

Reputation: 1736

How to identify columns which are best candidate for Indexing

I have read a lot of thread on identifying columns which can be the best candidate for creating indexes. However, most of them advise creating indexes on columns which are used in JOIN or WHERE clause.

Still, I am not sure for complex queries like below

select b.col1 ,a.col1 ,a.selectionId ,a.table2Id ,a.selectionName ,b.UserId,b.ParantId ,
a.teamType ,d.Name,b.isBack,b.SelectionId,b.Admin AS admin,
b.Master AS master,c.MstDate AS MstDate, c.col2 
from  tblselection a 
       join  table1 c on c.col1 = a.col1 
       join  table2 d on d.Id = a.table2Id 
        left join  tabletest b on b.SelectionId = a.selectionId and a.table2Id = b.table2Id and b.IsMatched = 1
       where ((ifnull(c.active,0) = 1) and isnull(b.Result) and isnull(b.ResultID)) 

Which columns are the best candidate for indexing in all 4 tables for this query?

I Should create composite indexes or individual indexes here?

Upvotes: 1

Views: 907

Answers (3)

Rick James
Rick James

Reputation: 142298

First, change

    where  ((ifnull(c.active,0) = 1)
              and  isnull(b.Result)
              and  isnull(b.ResultID))

to

    WHERE c.active = 1
      AND b.Result IS NULL
      AND b.ResultID IS NULL

I doubt if the Optimizer handles IFNULL() well.

Please provide SHOW CREATE TABLE and EXPLAIN SELECT...

You will probably now see that c is the first table in EXPLAIN. Let's help the Optimizer by providing

c:  INDEX(active, col1)  -- in that order

After that,

a:  INDEX(col1)
d:  INDEX(Id)  -- unless it is already PRIMARY KEY(Id)
b:  INDEX(IsMatched, table2Id, SelectionId)  -- in any order

When dealing with JOIN, you need to figure out what order the tables will be looked at. Do not trust EXPLAIN because it (initially) does not know what indexes you are about to add.

In your case, the WHERE referenced c and b. But b is a LEFT JOIN, so we can't use it. That leaves c. The only useful column to index is active (after reformulating). But that sounds like a simple, low-cardinality, "flag" that the Optimizer will shun. I added something to the index in hopes that the optimizer will be tricked into using it.

So, what will the "next" table be? The JOINs get from c only to a. So a is next. And the JOIN uses col1.

Next could be either d or b. Looks like the `Optimizer could do them in either order, and will Optimize things the same regardless of the order.

d: Id is how to get to it.

b needs 3 things in the ON to be checked with a simple AND. So a "composite" index of those 3 columns, in any order. (No, 'cardinality' of each column does not matter, even for picking the order.)

As for the b... IS NULL tests in the WHERE clause -- Those are presumably testing whether LEFT did not find a row. (Usually only the PK is checked, so I don't know whether you are also checking something else.)

One more optimization... Sometimes it is useful to build a "covering" index. That is an INDEX that contains all the columns (for a table) mentioned anywhere in the SELECT. This does not seem practical here, since it would involve lots of columns in each of the 4 cases.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269873

Basically, you want to start with indexable columns in the where clause. You don't have any (because of the function calls). So, then start with the join columns:

from tblselection a join
     table1 c
     on c.col1 = a.col1 join
     table2 d
     on d.Id = a.table2Id left join
     tabletest 
     on b.SelectionId = a.selectionId and a.table2Id = b.table2Id and b.IsMatched = 1

I would start with table1(col1), table2(id), and tabletest(SelectionId, table2Id, IsMatched).

Upvotes: 0

Cfreak
Cfreak

Reputation: 19309

Run your query with the EXPLAIN keyword in front of it (EXPLAIN SELECT b.cl1, a.col1 ...). MySQL will give you a breakdown of how things are joined together and how many rows it scans to figure it out.

Typically you want indexes on the columns that get referenced, especially if it needs to look at a lot of rows or if it's using "Table-scan" to join them. You want it to say "using primary" or "using index"

Here's some example output from one of my databases. You can see this query joins 3 tables but in the end finds what it needs very quickly even though the tables are fairly large. Each sub-part only needs to reference one row:

+----+-------------+-------+------------+--------+---------------------+---------+---------+-------------------------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys       | key     | key_len | ref                           | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+---------------------+---------+---------+-------------------------------+------+----------+-------------+
|  1 | SIMPLE      | sa    | NULL       | ref    | choice_id,user_id   | user_id | 5       | const                         |    1 |   100.00 | Using where |
|  1 | SIMPLE      | qc    | NULL       | eq_ref | PRIMARY,question_id | PRIMARY | 4       | sa.choice_id                  |    1 |   100.00 | Using where |
|  1 | SIMPLE      | q     | NULL       | eq_ref | PRIMARY             | PRIMARY | 4       | qc.question_id                |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+--------+---------------------+---------+---------+-------------------------------+------+----------+-------------+

Upvotes: 1

Related Questions