codepleb
codepleb

Reputation: 10571

Why do I need aliases in self join?

I'm learning SQL and happen to hang on a point that confuses me:

I got 1 table with 1 column (boys.boy) while boy is a varchar.

SELECT boys.boy, boys.boy 
FROM boys 
LEFT JOIN boys 
  ON length(boy) > length(boy)

Why am I unable to do this? Would this break some invariant, or why did the inventors explicitly bring in "self-joins"?

In other words, this works ("self join"):

SELECT b1.boy, b2.boy 
FROM boys AS b1 
INNER JOIN boys AS b2 
  ON length(b1.boy) > length(b2.boy)

The magic lies in the aliases (AS ...).

Upvotes: 1

Views: 3198

Answers (2)

Andrey Korneyev
Andrey Korneyev

Reputation: 26856

You have to use aliases in this case.

Otherwise server can't distinguish one boys.boy from another, and this construction

length(boy) > length(boy)

is pretty ambiguous - since it can be interpreted as

length(b1.boy) > length(b2.boy)

or

length(b2.boy) > length(b1.boy) 

or even

length(b1.boy) > length(b1.boy)

Update

Consider this simple snippet (t-sql used here):

declare @boys table (boy nvarchar(128))

insert into @boys
select 'Ed'
union
select 'Tom'
union
select 'Nick'

select b1.boy, b2.boy
from @boys as b1
    left outer join @boys as b2 on len(b1.boy) > len(b2.boy)

select b1.boy, b2.boy
from @boys as b1
    left outer join @boys as b2 on len(b2.boy) > len(b1.boy)

Output from first query will be

Ed      NULL
Nick    Ed
Nick    Tom
Tom     Ed

And from the second one:

Ed      Nick
Ed      Tom
Nick    NULL
Tom     Nick

Explanation:

Let's have a look at first query. It is basically: "take all records from table and combine each record with all other records having length of boy's name less than that record has". That's why whe getting Ed, Null pair - there are no records having length of name less than Ed has.

But in the second query condition is "take all records from table and combine each record with all other records having length of boy's name greater than that record has". That's why we're getting pairs Ed, Nick and Ed, Tom in this case.

Upvotes: 8

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239684

ON clauses can contain arbitrary predicates. Although it's common to see a join like this:

FROM Table1
         INNER JOIN
     Table2
         ON
            Table1ColumnA = Table2ColumnC and
            Table1ColumnB = Table2ColumnD

it is by no means required that the columns on the left are references from the first table and columns on the right are references to the second table.

For instance, when trying to find intervals that overlap, I would commonly construct a join as:

FROM
    Table1 t1
        INNER JOIN
    Table1 t2
        ON
             t1.Start < t2.End AND
             t2.Start < t1.End AND
             t1.ID != t2.ID

Note that the tables being used in each comparison are the opposite way around.

This also means that you can do more complex joins:

FROM
    Table1 t1
         LEFT JOIN
    Table2 t2
         ON
            t2.ID = t1.ID
         LEFT JOIN
    Table3 t3
         ON
             t1.ColB = t3.ColF
         INNER JOIN
    Table4 t4
         ON
               t4.ColA = t2.ColC OR
               t3.ColZ = t4.ColY

Which is effectively performing an INNER JOIN between tables 1 and 4 using either Table 2 or Table 3 (or both).

As I said at the top - the ON clauses can contain arbitrary predicates.

Upvotes: 0

Related Questions