Reputation: 611
I just do not know how to google for my question, so I apologize if there is an answer out there for this question already. I have my query working, but while I was troubleshooting, I came across behavior I don't understand.
I have three tables with the exact same structure--one for employees that have been terminated, one for employees that have chosen to waive medical coverage, and one for employees that do not have medical coverage. There are multiple companies in this data, and I am grouping by company and division.
I am doing FULL OUTER JOINs because some company/division groups may have 0 terminated employees, but may have some waiver employees(for example).
The structure of the three source tables are: Company---Division---NumbOfEmployees
The structure of the destination view is:
Company---Division---NumbOfTermEmp---NumbOfWaiverEmp---NumbOfNonMedEmp
So I am basically flattening out the data.
If I join table 1 to table 2 and then table 2 to table 3, I get the data I expect.
If I join table 1 to table 2, and then table 1 to table 3, I do not. I will get an extra row where, for instance, no data exists in table 1 but it does in table 2.
Visual representation below:
I do not understand what is going on behind the scenes enough to figure out this behavior. Why is this?
Upvotes: 0
Views: 97
Reputation: 31775
I can immediately think of at least one scenario that would cause this.
Doing your "Bad" join (both T2 and T3 join to T1), suppose you have a row that exists in T2 and T3, but not T1.
Then you are basically doing this:
First Join
T1 T2
NULL Data
Second Join
T1 T3
NULL Data
And since nowhere are you joining T2 to T3, it doesn't see the connection of "Data" in both columns, so it creates two rows. One for the first join and one for the second.
To really flatten the data, you should include a relationship between T2 & T3 in your join condition to T3 (T3 joins to both T1 & T2).
At least, I think that would work, with both joins being FULL OUTER.
Upvotes: 2
Reputation: 1269503
If, by flatten the data, you mean one row per employee with the information from the other tables, then full outer join
is one possibility. Another is union all
with aggregation:
select NumbOfEmployees,
sum(NumTerminated) as NumTerminated,
sum(NumWaiver) as NumWaiver,
sum(NumNonMed) as NumNonMed
from ((select Company, Division,
NumbOfEmployees as NumTerminated, 0 as NumWaiver, 0 as NumNonMed
from terminations
) union all
(select Company, Division,
0 as NumTerminated, NumbOfEmployees as NumWaiver, 0 as NumNonMed
from waivers
) union all
(select Company, Division,
0 as NumTerminated, 0 as NumWaiver, NumbOfEmployees as NumNonMed
from waivers
)
) cd
group by Company, Division;
Full outer join
s can be difficult to work with, particularly with multiple tables, because the join keys may not have matched an earlier join. My preference would be to have a list of all companies and divisions, and then use left join
instead. Or, the above query using union all
and group by
.
Upvotes: 2