Ed Avis
Ed Avis

Reputation: 1502

Self-join excluding joining a row to itself

Sometimes you want all pairs of rows in a table: its Cartesian product with itself. That could be done with a join.

drop table if exists #a
create table #a (x int)
insert into #a values (1),(2),(3)
select *
from #a a1
join #a a2
  on 0 = 0 -- always-true join condition

That gives nine rows.

If I wanted to get pairs except where a row is paired with itself, I could add a join condition requiring different values of x:

select *
from #a a1
join #a a2
  on a1.x != a2.x

That gives only six rows in the result. But I have assumed that column x is a unique identifier. Suppose it isn't:

delete from #a
insert into #a values (4), (4), (5)

There are two rows in the table holding value 4. They have identical values but nonetheless are separate rows and SQL does sometimes let you treat them separately (for example with set rowcount 1 update #a set x = 40 where x = 4 set rowcount 0). I would still like to get all pairs of rows from a self-join but excluding pairs where a row is paired with itself. So my desired output would be (4, 4), (4, 5), (4, 4), (4, 5), (5, 4), (5, 4). Still six rows in the output.

I have found one possible answer which I will post. (Of course, there are answers like "add a second column" or "make sure the column x has unique values", or of course "don't try to make a Cartesian product", but I am interested to find out what facilities T-SQL offers.)

Upvotes: 0

Views: 97

Answers (2)

Ed Avis
Ed Avis

Reputation: 1502

A related question ROWNUM as a pseudo column equivalent in T-SQL? suggests this solution

with t as (select *, row_number() over (order by (select null)) as rownum from #a)
select *
from t a1
join t a2
  on a1.rownum != a2.rownum

Checking the actual query plan (admittedly on this tiny toy table) shows that this doesn't introduce any extra sorting step, so should be reasonably fast, though perhaps not as fast as the native rownum provided by Oracle.

Upvotes: 1

Ed Avis
Ed Avis

Reputation: 1502

select *
from #a a1
join #a a2
  on a1.%%physloc%% != a2.%%physloc%%

This %%physloc%% is a loose equivalent of Oracle's rownum and gives the identity of the row on disk somehow. But whereas it's fairly common to see rownum-based queries for Oracle, on MSSQL I've never come across %%physloc%% until now.

According to a comment on Can %%physloc%% be used as a row identifier or key for an on-the-fly query? the 'location' of a row might change even while a query is executing so this isn't a reliable technique.

Upvotes: 0

Related Questions