Kamel Keb
Kamel Keb

Reputation: 393

Why and when to use CROSS JOIN instead of INNER JOIN with UPDATE statements?

Coding in T-Sql since three mounths or so, I've just seen for the first time the use of a CROSS JOIN in an UPDATE statement in some code and I'm not able to figure out the use cases of such a construct.

Does anyone know?

Edit: here is a sample code of what I can't understand well yet.

UPDATE a
SET a.COL1 = b.COL1
FROM Table1 AS a
CROSS JOIN Table2 AS b

And there are other updates in the code that provide a WHERE clause like:

UPDATE a
SET a.COL1 = b.COL1
FROM Table1 AS a
CROSS JOIN Table2 AS b
WHERE condition_on_columns_from_a_and_from_b

And the point is that for each row of Table1, a select on the the cross join with the filtering returns more than a row.

I'm a bit confused with the understanding of the behavior.

PS: the table Table1 takes more than 5 giga bytes of space..

Upvotes: 3

Views: 9101

Answers (4)

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239774

Okay, with this query:

UPDATE a
SET COL1 = b.COL1
FROM Table1 AS a
CROSS JOIN Table2 AS b
WHERE condition_on_columns_from_a_and_from_b

If we take the set formed by a CROSS JOIN b (and before considering the FROM clause), then we have a Cartesian product, where every row from a is paired with every row from b.

If we now consider the WHERE clause - unless this WHERE clause is sufficient to guarantee that each row from a is only represented once, then we will have an indeterminate result. That is, if there are two rows in the set which are both derived from the same row from a (but different rows from b), then there is no way to know, for sure, which of those two rows will be used to compute the SET a.COL1 = b.COL1 assignment.

I don't think it's even guaranteed, if we had the following:

UPDATE a
SET COL1 = b.COL1, COL2 = b.COL2
FROM --As before

that the same row from b will be used for both assignments.

All of the above is true for any UPDATE statement using the T-SQL FROM clause extension - unless you're careful to constrain your join conditions, then multiple assignments for the same row may be possible. But a CROSS JOIN just seems to make it far more likely to occur. And SQL Server issues no diagnostic messages if this occurs.

Upvotes: 0

HLGEM
HLGEM

Reputation: 96610

To understand the use case, you would need to look at the data. I can easily see using the first update if I was positive tableb would always and only contain one record. This is especially true of that one record has no field to join to table A on. In this case you are updating all the fields in table a with the value of that field in table b. Normally this type of thing where all records are updated woudl only be for resetting values.

To see what would be updated, do this:

UPDATE a
SET a.COL1 = b.COL1
--select a.COL1,b.COL1, *
FROM Table1 AS a
CROSS JOIN Table2 AS b
WHERE condition_on_columns_from_a_and_from_b

Now you can run just the select part to see what value a.col1 would be replaced with and see the other fields in the tables to see if the join and where clasue appear to be correct. This will help you understand what the corss join is doing. YOu could then temporarily replace the cross join with a left join and an inner join to understand what behavior it has that is differnt than the other types of joins. Play around with the select for awhile until you really understand what is happening. I never write an update without having the select in comments so I can ensure I am updating what I think I should be before I move the code to prod. This is espcially true if you write complex updates like I do that could involve ten or fifteen joins and several where conditions.

Upvotes: 0

RBarryYoung
RBarryYoung

Reputation: 56755

There is no good reason that I can imagine to do this. The query is either written incorrectly, or just a test to slow down your system or to invalidate the target table's data (or perhaps, just to see what it does).

It will probably set COL1 of every row in Table1 to the same single random value from Table2's COL1 (though probably either the first or last such value). But it will do so very inefficiently (unless the optimizer in later versions of SQL Server have optimized out this useless case, I haven't tested it in years myself).

Upvotes: 2

Philipp
Philipp

Reputation: 69703

A cross join generates the cartesian product of two tables. This means it combines EVERY row of table A with EVERY row of table B. When Table A has n rows and table B has m rows, the result set has n*m rows.

Upvotes: 3

Related Questions