Are these cross joins causing me problems?

Question

I have a poor man's replication setup that I can't do anything about. Some identifying data (basically primary key) from a call_table is copied into another table via a simple trigger, and then the "replication server" runs a stored procedure to copy the data from the queue table to a #temp table (to prevent locking in SQL 6.5 is the case that was made to me). Finally, a query uses the key data from the temp table to pull data back to the replication server from the call_table using this query:

  /* select the data to return to poor man replication server */     
  SELECT c.id,
         c.date,
         c.time,
         c.duration,
         c.location
    FROM #tmp q, call_table c (NOLOCK)
   WHERE q.id=c.id  
     AND q.date=c.date
     AND q.time=c.time
     AND q.duration=c.duration
     AND q.location=c.location
GROUP BY c.id,
         c.date,
         c.time,
         c.duration,
         c.location

Once a night the queue table is purged and this starts over. While investigating this, the implicit cross join jumped at me (I'm on the side that they are usually evil), but then I read The power of the Cross Join. I'm here because I'm not quite convinced. Say the temp table has about 10,000 rows for the day, the call_table has about 100,000 for the month so far. How is this query going to work? Does it mash the two tables together for a total of 1,000,000,000 in memory, then use the group clause to trim it back down? Could you explain what steps SQL takes to compile the results?

Execution Plans:

My Query:
      |--Hash Match Root(Aggregate, HASH:([c].[id], [c].[date], [c].[location], [c].[time], [c].[duration]), RESIDUAL:(((((((((((((((((((((([c].[id]=[c].[id] AND [c].[PIN]=[c].[PIN]) AND [c].[ORIG]=[c].[ORIG]) AND [c].[date]=[c].[date]) AND [c].[CTIME]=[c].[CTIME
           |--Hash Match Team(Inner Join, HASH:([q].[id], [q].[date], [q].[location], [q].[time], [q].[duration])=([c].[id], [c].[date], [c].[location], [c].[time], [c].[duration]), RESIDUAL:(((([c].[id]=[q].[id] AND [c].[location]=[q].[location]) AND [c].[duration]=[q].[duration]) AND [
                |--Table Scan(OBJECT:([db].[dbo].[queue] AS [q]))
                |--Table Scan(OBJECT:([db].[dbo].[call_table] AS [c]))

Yours:              
  |--Merge Join(Right Semi Join, MERGE:([q].[id], [q].[date], [q].[time], [q].[duration], [q].[location])=([c].[id], [c].[date], [c].[time], [c].[duration], [c].[location]), RESIDUAL:(((([q].[id]=[c].[id] AND [q].[location]=[c].[location]) AND [q].[duration]=[c].[duration]) AND [q].[
       |--Index Scan(OBJECT:([db].[dbo].[queue].[PK_queue] AS [q]), ORDERED)
       |--Sort(ORDER BY:([c].[id] ASC, [c].[date] ASC, [c].[time] ASC, [c].[duration] ASC, [c].[location] ASC))
            |--Table Scan(OBJECT:([db].[dbo].[call_table] AS [c]))

Quassnoi · Accepted Answer

The query you described is no way a CROSS JOIN.

SQL Server is smart enough to transform the WHERE condition into the JOIN's.

However, I see no point in GROUP BY here.

This query:

  SELECT c.id,
         c.date,
         c.time,
         c.duration,
         c.location
    FROM #tmp q, call_table c (NOLOCK)
   WHERE q.id=c.id  
     AND q.date=c.date
     AND q.time=c.time
     AND q.duration=c.duration
     AND q.location=c.location
GROUP BY c.id,
         c.date,
         c.time,
         c.duration,
         c.location

can be easilty rewritten as

  SELECT  c.id,
          c.date,
          c.time,
          c.duration,
          c.location
  FROM    call_table c (NOLOCK)
  WHERE   EXISTS
          (
          SELECT  NULL
          FROM    #tmp q
          WHERE   q.id = c.id  
                  AND q.date = c.date
                  AND q.time = c.time
                  AND q.duration = c.duration
                  AND q.location = c.location
          )

, provided that c.id is a PRIMARY KEY.

If it's not, just add DISTINCT to SELECT above.

Update:

From your plan I see that that your query uses HASH JOIN, while my uses MERGE SEMI JOIN.

The latter one is usually more efficient if you have an ordered set, but for some reason the query does not use the composite index you created, but instead performs full table scan.

This is strange, since all your values are contained within the index.

Probably (probably) this is because your fields allow NULL's.

Make sure that you use only the fields from the composite index both in WHERE conditions and in SELECT clause and, if possible, make them NOT NULL.

This should make your query use preordered resultsets in MERGE SEMI JOIN. You can tell it if you see neither TABLE SCAN nor SORT in the plan, just two INDEX SCAN's.

And two more questions:

Is c.id a PRIMARY KEY on call_table?
Is q.id a PRIMARY KEY on #tmp?

If answer to both questions is yes, then you will benefit from doing two things:

Defining your PRIMARY KEY as CLUSTERED on both tables

Rewriting your query as this:

 SELECT  c.id,
         c.date,
         c.time,
         c.duration,
         c.location
 FROM    call_table c (NOLOCK)
 JOIN    #tmp q
 ON      q.id = c.id  
         AND q.date = c.date
         AND q.time = c.time
         AND q.duration = c.duration
         AND q.location = c.location

Are these cross joins causing me problems?

Execution Plans:

Answers (2)

Related Questions