Reputation: 11
Below is something I am trying to achieve.
Data from 3 different files are loaded into a single table in which I need to categorize the data based on the names, and check for the best fit of the data across all rows.
The same name can have a maximum of 3 occurrences inside the table and a minimum of 1.
The data comparison should happen on all 3 rows or 2 rows (If the name did not come from one source) if there are more than one occurrences for a given name. If there is only 1 row for a given name that should be taken as the default value.
This is my attempt.
select
case
when (coalesce(length(A.x_manufacturer),0) > coalesce(length(B.x_manufacturer),0) AND coalesce(length(A.x_manufacturer),0) > coalesce(length(C.x_manufacturer),0)) then A.x_manufacturer
when (coalesce(length(B.x_manufacturer),0) > coalesce(length(A.x_manufacturer),0) AND coalesce(length(B.x_manufacturer),0) > coalesce(length(C.x_manufacturer),0)) then B.x_manufacturer
when (coalesce(length(C.x_manufacturer),0) > coalesce(length(A.x_manufacturer),0) AND coalesce(length(C.x_manufacturer),0) > coalesce(length(B.x_manufacturer),0)) then C.x_manufacturer
else C.x_manufacturer end as Best_Fit_x_manufacturer
from tbl1 A left outer join tbl1 B on
A.name = B.name
left outer join tbl1 C on C.name = B.name
where A.sourceid=1 and B.sourceid=2 and C.sourceid=3 group by
C.name
Sample Data in Table:
Name ManuFacturer source
A AB 1
A ABC 2
A ABCD 3
B BC 1
Expected Output
Name ManuFacturer source
A ABCD 3
B BC 1
Upvotes: 0
Views: 56
Reputation: 74660
As soon as you put a solid condition on an outer joined table, into the WHERE clause, the join reverts to INNER join behavior:
from
tbl1 A
left outer join tbl1 B on A.name = B.name
left outer join tbl1 C on C.name = B.name
where
A.sourceid=1 and
B.sourceid=2 and --wrong; this will cause A outer join B to become an INNER join
C.sourceid=3 --wrong; this will cause B outer join C to become an INNER join
Put your clauses in the ON instead:
from
tbl1 A
left outer join tbl1 B on A.name = B.name AND B.sourceid=2
left outer join tbl1 C on C.name = B.name AND C.sourceid=3
where
A.sourceid=1
The reason why is:
Outer joins generate NULLs in every column if there's no match between rows, hence B.sourceid might well be null in some rows of the result set. Specifying WHERE B.sourceid=2
causes all the rows where sourceid is null, to disappear, because 2 is not equal to null (nothing is equal to null). This means the only rows that you can possibly get out of it is rows where there IS a match.. Which is an inner join
Upvotes: 2