Reputation: 393
Coding in T-Sql since three mounths or so, I've just seen for the first time the use of a CROSS JOIN
in an UPDATE
statement in some code and I'm not able to figure out the use cases of such a construct.
Does anyone know?
Edit: here is a sample code of what I can't understand well yet.
UPDATE a
SET a.COL1 = b.COL1
FROM Table1 AS a
CROSS JOIN Table2 AS b
And there are other updates in the code that provide a WHERE
clause like:
UPDATE a
SET a.COL1 = b.COL1
FROM Table1 AS a
CROSS JOIN Table2 AS b
WHERE condition_on_columns_from_a_and_from_b
And the point is that for each row of Table1, a select on the the cross join with the filtering returns more than a row.
I'm a bit confused with the understanding of the behavior.
PS: the table Table1 takes more than 5 giga bytes of space..
Upvotes: 3
Views: 9101
Reputation: 239774
Okay, with this query:
UPDATE a
SET COL1 = b.COL1
FROM Table1 AS a
CROSS JOIN Table2 AS b
WHERE condition_on_columns_from_a_and_from_b
If we take the set formed by a CROSS JOIN b
(and before considering the FROM
clause), then we have a Cartesian product, where every row from a
is paired with every row from b
.
If we now consider the WHERE
clause - unless this WHERE
clause is sufficient to guarantee that each row from a
is only represented once, then we will have an indeterminate result. That is, if there are two rows in the set which are both derived from the same row from a
(but different rows from b
), then there is no way to know, for sure, which of those two rows will be used to compute the SET a.COL1 = b.COL1
assignment.
I don't think it's even guaranteed, if we had the following:
UPDATE a
SET COL1 = b.COL1, COL2 = b.COL2
FROM --As before
that the same row from b
will be used for both assignments.
All of the above is true for any UPDATE
statement using the T-SQL FROM
clause extension - unless you're careful to constrain your join conditions, then multiple assignments for the same row may be possible. But a CROSS JOIN
just seems to make it far more likely to occur. And SQL Server issues no diagnostic messages if this occurs.
Upvotes: 0
Reputation: 96610
To understand the use case, you would need to look at the data. I can easily see using the first update if I was positive tableb would always and only contain one record. This is especially true of that one record has no field to join to table A on. In this case you are updating all the fields in table a with the value of that field in table b. Normally this type of thing where all records are updated woudl only be for resetting values.
To see what would be updated, do this:
UPDATE a
SET a.COL1 = b.COL1
--select a.COL1,b.COL1, *
FROM Table1 AS a
CROSS JOIN Table2 AS b
WHERE condition_on_columns_from_a_and_from_b
Now you can run just the select part to see what value a.col1 would be replaced with and see the other fields in the tables to see if the join and where clasue appear to be correct. This will help you understand what the corss join is doing. YOu could then temporarily replace the cross join with a left join and an inner join to understand what behavior it has that is differnt than the other types of joins. Play around with the select for awhile until you really understand what is happening. I never write an update without having the select in comments so I can ensure I am updating what I think I should be before I move the code to prod. This is espcially true if you write complex updates like I do that could involve ten or fifteen joins and several where conditions.
Upvotes: 0
Reputation: 56755
There is no good reason that I can imagine to do this. The query is either written incorrectly, or just a test to slow down your system or to invalidate the target table's data (or perhaps, just to see what it does).
It will probably set COL1 of every row in Table1 to the same single random value from Table2's COL1 (though probably either the first or last such value). But it will do so very inefficiently (unless the optimizer in later versions of SQL Server have optimized out this useless case, I haven't tested it in years myself).
Upvotes: 2
Reputation: 69703
A cross join generates the cartesian product of two tables. This means it combines EVERY row of table A with EVERY row of table B. When Table A has n rows and table B has m rows, the result set has n*m rows.
Upvotes: 3