illvm
illvm

Reputation: 1346

Aggregating two selects with a group by in SQL is really slow

I am currently working with a query in in MSSQL that looks like:

SELECT
   ...
FROM
(SELECT
   ...
)T1
JOIN
(SELECT
   ...
)T2
GROUP BY
...

The inner selects are relatively fast, but the outer select aggregates the inner selects and takes an incredibly long time to execute, often timing out. Removing the group by makes it run somewhat faster and changing the join to a LEFT OUTER JOIN speeds things up a bit as well.

Why would doing a group by on a select which aggregates two inner selects cause the query to run so slow? Why does an INNER JOIN run slower than a LEFT OUTER JOIN? What can I do to troubleshoot this further?

EDIT: What makes this even more perplexing is the two inner queries are date limited and the overall query only runs slow when looking at date ranges between the start of July and any other day in July, but if the date ranges are anytime before the the July 1 and Today then it runs fine.

Upvotes: 1

Views: 1181

Answers (6)

Ty.
Ty.

Reputation: 3948

JOIN = Cartesian Product. All columns from both tables will be joined in numerous permutations. It is slow because the inner queries are querying each of the separate tables, but once they hit the join, it becomes a Cartesian product and is more difficult to manage. This would occur at the outer select statement.

Have a look at INNER JOINs as Tetraneutron recommended.

Upvotes: 0

illvm
illvm

Reputation: 1346

The issue was with fragmented data. After the data was defragmented the query started running within reasonable time constraints.

Upvotes: 0

jn29098
jn29098

Reputation: 1415

After your t2 statement add a join condition on t1.joinfield = t2.joinfield

Upvotes: 0

Spence
Spence

Reputation: 29372

Have you given a join predicate? Ie join table A ON table.ColA = table.ColB. If you don't give a predicate then SQL may be forced to use nested loops, so if you have a lot of rows in that range it would explain a query slow down.

Have a look at the plan in the SQL studio if you have MS Sql Server to play with.

Upvotes: 0

ttarchala
ttarchala

Reputation: 4587

Try rewriting your query without the nested SELECTs, which are rarely necessary. When using nested SELECTs - except for trivial cases - the inner SELECT resultsets are not indexed, which makes joining them to anything slow.

As Tetraneutron said, post details of your query -- we may help you rewrite it in a straight-through way.

Upvotes: 0

Tetraneutron
Tetraneutron

Reputation: 33861

Without some more detail of your query its impossible to offer any hints as to what may speed your query up. A possible guess is the two inner queries are blocking access to any indexes which might have been used to perform the join resulting in large scans but there are probably many other possible reasons.

To check where the time is used in the query check the execution plan, there is a detailed explanation here

http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx

The basic run down is run the query, and display the execution plan, then look for any large percentages - they are what is slowing your query down.

Upvotes: 1

Related Questions