Alexandre_Almeida
Alexandre_Almeida

Reputation: 141

Multiples INNER JOIN is too slow SQL SERVER

I'm having a performance problem.

I created a table that receives data from a file, I do a BULK INSERT. Then I do a SELECT with multiple INNER JOINs (11 inner joins) to insert into another table with the right data.

When I run this SELECT, it takes too long (more than a hour) and then I stop it. My solution was to break this query into 3, creating @temp tables. To my surprise, that takes 3 minutes. That's what I'm trying to understand, WHY breaking my query into 3 was FASTER than one select statement. Here is my query:

SELECT t1.ReturnINT, t1.ReturnBIT, t2.ReturnINT, t3.ReturnINT, t5.ReturnINT, t1.ReturnDateTime
FROM t1
INNER JOIN t2
    ON t2.my_column_varchar = t1.my_column_varchar
INNER JOIN t3
    ON t3.my_column_number = t1.my_column_number AND t2.my_column_ID = t3.my_column_ID
INNER JOIN t4
    ON t4.my_column_varchar = t1.my_column_varchar
INNER JOIN t5
    ON t5.my_column_int = t1.my_column_int AND t5.my_column_int = t4.my_column_int AND t2.my_column_int = t5.my_column_int
INNER JOIN t6
    ON t6.my_column_int = t5.my_column_int AND t6.my_column_int = t2.my_column_int
INNER JOIN t7
    ON t7.my_column_int = t6.my_column_int
INNER JOIN t8
    ON t8.my_column_int = t3.my_column_int AND t8.my_column_datetime = t1.my_column_datetime
INNER JOIN t9
    ON t9.my_column_int = t3.my_column_int AND t8.my_column_datetime BETWEEN t9.my_column_datetime1 AND t9.datetime1 + t9.my_column_datetime2
INNER JOIN t10
    ON t10.my_column_int = t9.my_column_int AND t10.my_column_int = t6.my_column_int
INNER JOIN t11
    ON t11.my_column_int = t9.my_column_int AND t8.my_column_datetime = t11.my_column_datetime

----EDITED----

There is NO where clause, my query is exactly as I put here.

Here is my broken querys, i forget to put them here. It runs in 3 minutes.

DECLARE @temp TABLE (
    <Some_columns>
)
INSERT INTO @temp
    SELECT <My_Linked_Columns>
    FROM t1
    INNER JOIN t2
        ON t2.my_column_varchar = t1.my_column_varchar
    INNER JOIN t3
        ON t3.my_column_number = t1.my_column_number AND t2.my_column_ID = t3.my_column_ID
    INNER JOIN t4
        ON t4.my_column_varchar = t1.my_column_varchar
    INNER JOIN t5
        ON t5.my_column_int = t1.my_column_int AND t5.my_column_int = t4.my_column_int AND t2.my_column_int = t5.my_column_int


DECLARE @temp2 TABLE(
    <Some_Columns>
)
INSERT INTO @temp2
    SELECT <More_Linked_Columns>
    FROM @temp as temp
    INNER JOIN t6
        ON t6.my_column_int = temp.my_column_int AND t6.my_column_int = temp.my_column_int
    INNER JOIN t7
        ON t7.my_column_int = t6.my_column_int
    INNER JOIN t8
        ON t8.my_column_int = temp.my_column_int AND t8.my_column_datetime = temp.my_column_datetime


DECLARE @temp3 TABLE(
    <Some_Columns>
)
INSERT INTO @temp3
    SELECT <More_Linked_Columns>
    FROM @temp2 AS temp2
    INNER JOIN t9
        ON t9.my_column_int = temp2.my_column_int AND temp2.my_column_datetime BETWEEN t9.my_column_datetime1 AND t9.datetime1 + t9.my_column_datetime2
    INNER JOIN t10
        ON t10.my_column_int = t9.my_column_int AND t10.my_column_int = temp2.my_column_int
    INNER JOIN t11
        ON t11.my_column_int = t9.my_column_int AND temp2.my_column_datetime = t11.my_column_datetime


SELECT <All_Final_Columns>
FROM @temp3

----EDITED 3----

Studying more things I discovered a problem in execution plan. I have a Nested Loop that estimates 1 row but it actually returns 1.204.014 rows. I guess the problem is exactly here, but I didn't find out how to solve this problem without breaking my query in 3 parts (Now I know why breaking it is faster hehehe)

Upvotes: 7

Views: 19946

Answers (4)

Muab Nhoj
Muab Nhoj

Reputation: 121

Could be a few different things but from what it sounds like, you have checked your indexes and execution plan. I'd suggest Adam Mechanic's video about 'row goals' and using the top statement, which would work similarly to how the temp tables worked.

SELECT *
FROM
  (SELECT top (2000000000) t1.ReturnINT,
          t1.ReturnBIT,
          t2.ReturnINT,
          t3.ReturnINT,
          t5.ReturnINT,
          t1.ReturnDateTime
   FROM t1
   INNER JOIN t2 ON t2.my_column_varchar = t1.my_column_varchar
   INNER JOIN t3 ON t3.my_column_number = t1.my_column_number
       AND t2.my_column_ID = t3.my_column_ID
   INNER JOIN t4 ON t4.my_column_varchar = t1.my_column_varchar
   INNER JOIN t5 ON t5.my_column_int = t1.my_column_int
       AND t5.my_column_int = t4.my_column_int
       AND t2.my_column_int = t5.my_column_int
   INNER JOIN t6 ON t6.my_column_int = t5.my_column_int
       AND t6.my_column_int = t2.my_column_int
   INNER JOIN t7 ON t7.my_column_int = t6.my_column_int
   INNER JOIN t8 ON t8.my_column_int = t3.my_column_int
       AND t8.my_column_datetime = t1.my_column_datetime
   INNER JOIN t9 ON t9.my_column_int = t3.my_column_int
       AND t8.my_column_datetime BETWEEN t9.my_column_datetime1 
       AND t9.datetime1 + t9.my_column_datetime2
   INNER JOIN t10 ON t10.my_column_int = t9.my_column_int
       AND t10.my_column_int = t6.my_column_int
   INNER JOIN t11 ON t11.my_column_int = t9.my_column_int
       AND t8.my_column_datetime = t11.my_column_datetime
)

I had a similar issue but it had a where clause after several joins and took a 10 minute query and reduced it to 39 seconds.

Upvotes: 0

John
John

Reputation: 4006

With the right index or indexes the original query should execute very quickly (lest than one second if you page the data). Don't use temp tables as a hack for not being able to come up with a reasonable query.

Upvotes: 0

Dave.Gugg
Dave.Gugg

Reputation: 6771

In general, you want the query optimizer to join the tables in such a way that will limit the result set as much as possible. If you have table A with 1 million rows, table B with 1 million rows, and table C with 10 rows, you'd want to inner join from table C to A or B first, this would give you at most 10 records (assuming a 1:1 match) to then join to the last table. If you joined A to B first you'd be joining on all 1 million rows from each, which would take quite a bit longer.

Usually the query optimizer is "good enough" at picking the join order, but in your case it was not. The best method I've seen for forcing the join order was demonstrated by Adam Mechanic in a blog post here. It involves using a TOP clause on the tables you want to begin the join from. The query optimizer will then get the result set from those tables first, and you can really limit the total number of rows and increase query performance. I use this method whenever possible.

Upvotes: 2

DhruvJoshi
DhruvJoshi

Reputation: 17126

Most common reasons:

Reason 1: When two tables having n and m rows participating in INNER JOIN have many to many relationship, then the INNER JOIN can near a CROSS JOIN and can produce result set with more than MAX(n,m) rows, theoretically n x m rows are possible.

Now imagine many such tables in INNER JOIN.

This will result in the result set becoming bigger and bigger and will start eating into the allocated memory area.

This could be a reason why temp tables might help you.

Reason 2: You do not have INDEX built on the columns you are joining tables on.

Reason 3: Do you have functions in WHERE clause?

Upvotes: 7

Related Questions