Reputation: 9017

How to speedup joins

I'm using SQL server 2008r2. I have a problem of returning data to the user because of massive joins (for example I need to make 5 inner + 6 left joins in one query (usually tvfs, sometimes tables). It takes toooo long.)

What are the workarounds for this problem? Should I denormolize my database? What are the best practices to avoid huge number of joins?

Upvotes: 0

Answers (2)

user1222760

Reputation: 79

Here's a good (over-simplified) example of staging:

Let's say you wanted to find all of the high-risk individuals in your city (Might as well be interesting about it). You have a Phone company dB (national) indexed by state, city, last name, first name, address and an FBI dB (global) indexed by last name, first name, country, region, address. Let's say the FBI dB has multiple records for each individual due to multiple past addresses.

You could join the two dBs on the common elements and then qualify your criteria. Or... Select RecordID from Phone as P1 Where State = 'MyState' and City = 'MyCity' and exists (Select 1 From TheMan as M1 Where M1.Last = P1.Last and M1.First = P1.First and M1.Risk > 80)

Now I have a small record-set to qualify and a small result-set to work from. From there I can go get details. That's a good candidate for a CTE and I could shoot a dozen holes in the logic, but it illustrates the concept. If you bring M1.Risk (non-indexed field) into the equation with a full join, you're forcing SQL Server to plan against it in certain situations. Not necessarily here, but as your logic gets more complex and subsequent non-indexed criteria comes into play.

Upvotes: 0

user1222760

Reputation: 79

I'd have to see the SQL to troubleshoot specifics, but here's a few things I do when pulling results that have extremely high demand:

Use you tools. Display Estimated Execution Plan can expose some obvious vagaries in your logic.
Learn to love 'where exists' and 'having'. You can minimize the focus and scope sometimes by qualifying in creative ways that don't require HARD IO. This is more true for sub-queries than joins but I add a clause for every outer join I need.
Most importantly IMO, don't be afraid of staging your results. You sometimes need to process billions/trillions of transactions against millions of records and what takes hours with joins can be accomplished in minutes or seconds by staging. If you only need x% of you top 2 or 3 tables, why join every record top to bottom? Sometimes it's just too much overhead. Pull your simplest result-set down to a stage table (or temp, whatever you need), index it and then go after the next chunk. That usually saves me a fortune in memory.
Use CTEs when you can. However, my experience has been they degrade beyond a certain point. Nice for ancillary tables but not for serious volume.

Be creative in your combinations. I'll use those exists clauses in Stage 1 (reading Tables a, b and c) to only bring back the records that also exist in tables d, e and f.

A lot of the expert SQL advice is not based on VLDBs - it's based on Customer, Orders, Demographic type schemas.

Are these stored procs run natively?

Upvotes: 1

How to speedup joins

Answers (2)

Related Questions