Jeff.Clark
Jeff.Clark

Reputation: 611

Need help understanding this TSQL join behavior

I just do not know how to google for my question, so I apologize if there is an answer out there for this question already. I have my query working, but while I was troubleshooting, I came across behavior I don't understand.

I have three tables with the exact same structure--one for employees that have been terminated, one for employees that have chosen to waive medical coverage, and one for employees that do not have medical coverage. There are multiple companies in this data, and I am grouping by company and division.

I am doing FULL OUTER JOINs because some company/division groups may have 0 terminated employees, but may have some waiver employees(for example).

The structure of the three source tables are: Company---Division---NumbOfEmployees

The structure of the destination view is:

Company---Division---NumbOfTermEmp---NumbOfWaiverEmp---NumbOfNonMedEmp

So I am basically flattening out the data.

If I join table 1 to table 2 and then table 2 to table 3, I get the data I expect.

If I join table 1 to table 2, and then table 1 to table 3, I do not. I will get an extra row where, for instance, no data exists in table 1 but it does in table 2.

Visual representation below:

enter image description here

I do not understand what is going on behind the scenes enough to figure out this behavior. Why is this?

Upvotes: 0

Views: 97

Answers (2)

Tab Alleman
Tab Alleman

Reputation: 31775

I can immediately think of at least one scenario that would cause this.

Doing your "Bad" join (both T2 and T3 join to T1), suppose you have a row that exists in T2 and T3, but not T1.

Then you are basically doing this:

First Join
T1    T2    
NULL  Data

Second Join
T1    T3
NULL  Data

And since nowhere are you joining T2 to T3, it doesn't see the connection of "Data" in both columns, so it creates two rows. One for the first join and one for the second.

To really flatten the data, you should include a relationship between T2 & T3 in your join condition to T3 (T3 joins to both T1 & T2).

At least, I think that would work, with both joins being FULL OUTER.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269503

If, by flatten the data, you mean one row per employee with the information from the other tables, then full outer join is one possibility. Another is union all with aggregation:

select NumbOfEmployees,
       sum(NumTerminated) as NumTerminated,
       sum(NumWaiver) as NumWaiver,
       sum(NumNonMed) as NumNonMed
from ((select Company, Division,
              NumbOfEmployees as NumTerminated, 0 as NumWaiver, 0 as NumNonMed
       from terminations
      ) union all
      (select Company, Division,
              0 as NumTerminated, NumbOfEmployees as NumWaiver, 0 as NumNonMed
       from waivers
      ) union all
      (select Company, Division,
              0 as NumTerminated, 0 as NumWaiver, NumbOfEmployees as NumNonMed
       from waivers
      )
     ) cd
group by Company, Division;

Full outer joins can be difficult to work with, particularly with multiple tables, because the join keys may not have matched an earlier join. My preference would be to have a list of all companies and divisions, and then use left join instead. Or, the above query using union all and group by.

Upvotes: 2

Related Questions