Karivadha
Karivadha

Reputation: 11

Best Fit data retrial using left Join in MySQL

Below is something I am trying to achieve.

Data from 3 different files are loaded into a single table in which I need to categorize the data based on the names, and check for the best fit of the data across all rows.

The same name can have a maximum of 3 occurrences inside the table and a minimum of 1.

The data comparison should happen on all 3 rows or 2 rows (If the name did not come from one source) if there are more than one occurrences for a given name. If there is only 1 row for a given name that should be taken as the default value.

This is my attempt.

select 
case
when (coalesce(length(A.x_manufacturer),0) > coalesce(length(B.x_manufacturer),0) AND coalesce(length(A.x_manufacturer),0) > coalesce(length(C.x_manufacturer),0))  then A.x_manufacturer
when (coalesce(length(B.x_manufacturer),0) > coalesce(length(A.x_manufacturer),0) AND coalesce(length(B.x_manufacturer),0) > coalesce(length(C.x_manufacturer),0))  then B.x_manufacturer
when (coalesce(length(C.x_manufacturer),0) > coalesce(length(A.x_manufacturer),0) AND coalesce(length(C.x_manufacturer),0) > coalesce(length(B.x_manufacturer),0)) then C.x_manufacturer
 else C.x_manufacturer end as Best_Fit_x_manufacturer
from tbl1 A left outer join tbl1 B on 
A.name = B.name 
left outer join tbl1 C on C.name = B.name 
where A.sourceid=1 and B.sourceid=2 and C.sourceid=3  group by 
  C.name

Sample Data in Table:

Name  ManuFacturer source
A     AB           1
A     ABC          2
A     ABCD         3
B     BC           1

Expected Output

Name      ManuFacturer source
A         ABCD         3
B         BC           1

Upvotes: 0

Views: 56

Answers (1)

Caius Jard
Caius Jard

Reputation: 74660

As soon as you put a solid condition on an outer joined table, into the WHERE clause, the join reverts to INNER join behavior:

from 
  tbl1 A 
  left outer join tbl1 B on A.name = B.name 
  left outer join tbl1 C on C.name = B.name 
where 
  A.sourceid=1 and 
  B.sourceid=2 and --wrong; this will cause A outer join B to become an INNER join
  C.sourceid=3     --wrong; this will cause B outer join C to become an INNER join

Put your clauses in the ON instead:

from 
  tbl1 A 
  left outer join tbl1 B on A.name = B.name AND B.sourceid=2
  left outer join tbl1 C on C.name = B.name AND C.sourceid=3
where 
  A.sourceid=1

The reason why is:

Outer joins generate NULLs in every column if there's no match between rows, hence B.sourceid might well be null in some rows of the result set. Specifying WHERE B.sourceid=2 causes all the rows where sourceid is null, to disappear, because 2 is not equal to null (nothing is equal to null). This means the only rows that you can possibly get out of it is rows where there IS a match.. Which is an inner join

Upvotes: 2

Related Questions