Reputation: 131
Let db1 and db2 db1.table1
annee code code2 var1 ....
1991 11 12 779
1991 11 14 105
1991 11 15 10
1991 12 11 466
1991 12 14 296
1991 12 15 270
1991 14 11 15
1991 14 12 510
1991 14 15 6
1991 15 11 193
1991 15 12 455
1991 15 14 4
....
1992 11 12 779
1992 11 14 105
1992 11 15 10
1992 12 11 466
1992 12 14 296
1992 12 15 270
1992 14 11 15
1992 14 12 510
1992 14 15 6
1992 15 11 193
1992 15 12 455
1992 15 14 4
....
db2.table2
var1 code ...
test 11
test 12
test 14
test2 11
test2 14
test2 15
...
I need to optimize the following query (because db1.table1 contains 8 000 000 rows) :
select annee,sum(var1) from db1.table1 as M where
M.code in
(select t1.code from db2.table2 as t1 cross join db2.table2 as t2 where t1.var1='Test2' and t2.var1='Test2' and t1.code <> t2.code)
and M.code2 in
(select t2.code from db2.table2 as t1 cross join db2.table2 as t2 where t1.var1='Test2' and t2.var1='Test2' and t1.code <> t2.code)
group by annee order by annee desc
db1.table1 and db2.table2 are indexed and sorted. Any advice would be much appreciated! Thanks
Upvotes: 0
Views: 58
Reputation: 142316
table2: INDEX(var1, code)
table1: INDEX(code, code2, annee)
Change IN ( SELECT ... )
into JOIN ( SELECT ... ) ON ...
; the former is poorly optimized.
If you are using MySQL 5.6 or later, then the subqueries may be sufficiently optimized. If you are using an older version, create a TEMPORARY TABLE
with that duplicated subquery.
Upvotes: 0
Reputation: 3906
As variant you can try the following
select m.annee,sum(m.var1)
from db1.table1 m
join
(
select t1.code code1,t2.code code2
from db2.table2 t1
join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<t2.code
) c
on (m.code=c.code1 and m.code2=c.code2) or (m.code=c.code2 and m.code2=c.code1)
group by m.annee
order by m.annee desc
I used JOIN
instead CROSS JOIN
and JOIN
instead two IN
.
And if it suit you you can try optimize query
select m.annee,sum(m.var1)
from db2.table2 t1
join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<t2.code
join db1.table1 m on (m.code=t1.code and m.code2=t2.code) or (m.code=t2.code and m.code2=t1.code)
group by m.annee
order by m.annee desc
The first JOIN
returns all the combinations for test2
. There are (11,12) and (11,14)
db2.table2 t1
join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<t2.code
And the second JOIN
checks rows from table1
on these combinations
join db1.table1 m on (m.code=t1.code and m.code2=t2.code) or (m.code=t2.code and m.code2=t1.code)
Try to check the next variant, too
select m.annee,sum(m.var1)
from db2.table2 t1
join db2.table2 t2 on t1.var1='Test2' and t2.var1=t1.var1 and t1.code<>t2.code
join db1.table1 m on m.code=t1.code and m.code2=t2.code
group by m.annee
order by m.annee desc
If the last variant returns correct result then you can try to add index for (code,code2)
into table1
CREATE INDEX idx_table1_code_code2 ON db1.table1 (code,code2)
Upvotes: 2
Reputation: 4518
I've tried to make your query logic more simpler. Hope this help
select annee,sum(var1)
from db1.table1 as M where
exists( select var1 from db2.table2 t2
where t2.var1='Test2'
group by t2.var1
having sum(t2.code = M.code) >= 1
and sum(t2.code = M.code2) >= 1
and (M.code != M.code2 or sum(t2.code != M.code) >= 1))
group by annee
order by annee desc
Upvotes: 1