user2129506
user2129506

Reputation: 131

Mysql optimization query on 2 tables

Let db1 and db2 db1.table1

annee   code    code2   var1    ....
1991    11    12    779
1991    11    14    105
1991    11    15    10
1991    12    11    466
1991    12    14    296
1991    12    15    270
1991    14    11    15
1991    14    12    510
1991    14    15    6
1991    15    11    193
1991    15    12    455
1991    15    14    4   
....
1992    11    12    779
1992    11    14    105
1992    11    15    10
1992    12    11    466
1992    12    14    296
1992    12    15    270
1992    14    11    15
1992    14    12    510
1992    14    15    6
1992    15    11    193
1992    15    12    455
1992    15    14    4   
....

db2.table2

var1    code    ...
test  11
test  12
test  14
test2 11
test2 14
test2 15
...

I need to optimize the following query (because db1.table1 contains 8 000 000 rows) :

select annee,sum(var1) from db1.table1 as M where 
M.code in 
(select t1.code from db2.table2 as t1 cross join db2.table2 as t2 where t1.var1='Test2' and t2.var1='Test2' and t1.code <> t2.code) 
and M.code2 in 
(select t2.code from db2.table2 as t1 cross join db2.table2 as t2 where t1.var1='Test2' and t2.var1='Test2' and t1.code <> t2.code) 
group by annee order by annee desc

db1.table1 and db2.table2 are indexed and sorted. Any advice would be much appreciated! Thanks

Upvotes: 0

Views: 58

Answers (3)

Rick James
Rick James

Reputation: 142316

table2: INDEX(var1, code)
table1: INDEX(code, code2, annee)

Change IN ( SELECT ... ) into JOIN ( SELECT ... ) ON ...; the former is poorly optimized.

If you are using MySQL 5.6 or later, then the subqueries may be sufficiently optimized. If you are using an older version, create a TEMPORARY TABLE with that duplicated subquery.

Upvotes: 0

Sergey Menshov
Sergey Menshov

Reputation: 3906

As variant you can try the following

select m.annee,sum(m.var1)
from db1.table1 m
join
  (
    select t1.code code1,t2.code code2
    from db2.table2 t1
    join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<t2.code
  ) c
on (m.code=c.code1 and m.code2=c.code2) or (m.code=c.code2 and m.code2=c.code1)
group by m.annee
order by m.annee desc

I used JOIN instead CROSS JOIN and JOIN instead two IN.

And if it suit you you can try optimize query

select m.annee,sum(m.var1)
from db2.table2 t1
join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<t2.code
join db1.table1 m on (m.code=t1.code and m.code2=t2.code) or (m.code=t2.code and m.code2=t1.code)
group by m.annee
order by m.annee desc

The first JOIN returns all the combinations for test2. There are (11,12) and (11,14)

db2.table2 t1
join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<t2.code

And the second JOIN checks rows from table1 on these combinations

join db1.table1 m on (m.code=t1.code and m.code2=t2.code) or (m.code=t2.code and m.code2=t1.code)

Try to check the next variant, too

select m.annee,sum(m.var1)
from db2.table2 t1
join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<>t2.code
join db1.table1 m on m.code=t1.code and m.code2=t2.code
group by m.annee
order by m.annee desc

If the last variant returns correct result then you can try to add index for (code,code2) into table1

CREATE INDEX idx_table1_code_code2 ON db1.table1 (code,code2)

Upvotes: 2

I've tried to make your query logic more simpler. Hope this help

    select annee,sum(var1) 
        from db1.table1 as M where 
            exists( select var1 from db2.table2 t2 
                        where t2.var1='Test2' 
                        group by t2.var1 
                        having sum(t2.code = M.code) >= 1 
                            and sum(t2.code = M.code2) >= 1 
                            and (M.code != M.code2 or sum(t2.code != M.code) >= 1))

        group by annee 
        order by annee desc

Upvotes: 1

Related Questions