Jan
Jan

Reputation: 11

How to join tables with multiple conditions?

I am new to making complex (let me know if this one is not complex) SQL queries and am struggling to create a query that will join 2 tables with multiple different combinations. Here is my query for just the 1st combination.

SELECT DISTINCT
    A.Id, A.Ref1, A.Ref2, A.Ref3, B.Ref1, B.Ref2, B.Ref3
FROM
    A
JOIN
    B ON TRIM(UPPER(B.Ref1)) = TRIM(UPPER(A.Ref1)) 
      AND TRIM(UPPER(B.Ref2)) = TRIM(UPPER(A.Ref2)) 
      AND TRIM(UPPER(B.Ref3)) = TRIM(UPPER(A.Ref3))

Ref1 - Ref3 are columns that are necessary in the 1st combination and this is classified as an "Exact Match".

However, if it so happens that only Ref1 and Ref2 matches, then it is classified as a "Partial Match" instead. Here is a list of possible combinations:

My initial attempt at this looked to have been working just fine but eventually broke once I added values that would make the result set non-distinct between the matches.

Anyway, for the actual goal, I intended for Exact Match entries to not be included when searching Partial Match 1 entries, then for Partial 2, Partial Match 1 and Exact Match should not be included (not sure if I am making enough sense at this point).

My initial attempts keep duplicating records which is really just redundant data since if it already matched exactly, the record should not have been partially matched as well. Here is a sample of the result I got:

Id     A.Ref1     A.Ref2     A.Ref3     B.Ref1     B.Ref2     B.Ref3     Match Type
 1     Val1       Val1       Val1       Val1       Val1       Val1       Exact Match
 1     Val1       Val1       NULL       Val1       Val1       NULL       Partial Match 1
 1     Val1       NULL       Val1       Val1       NULL       Val1       Partial Match 1
 1     Val1       NULL       NULL       Val1       NULL       NULL       Partial Match 2

In the scenario above, since Id 1 already is an "Exact Match", I don't want it to appear multiple times for "Partial Match" except for when there are other distinct field values for the record.

I am trying to do this all in just 1 query but I don't think there should be limitations to have them as separate ones instead. It would probably be easier that way but please do let me know if there is a possibility of it being done in just 1 go as this will be executed several times.

Any help on the matter will be much appreciated. Thank you.

UPDATE

Thanks to @Thorsten Kettner's answer below, I was able to create a query I am satisfied with using the With approach. The code below is how it looks now. There is some unreliability with the single matching Partial_Matches2 but there is no immediate concern for that. Overall pretty happy with the results I got already.

-- Get exact match
with exact_matches as 
(
    Select Distinct
            Id, A.Ref1, B.Ref1, A.Ref2, B.Ref2, A.Ref3, B.Ref3, 'Exact Match' as "Match_Type"
    from  A
    Join B
        On  TRIM(UPPER(B.Ref1)) = TRIM(UPPER(A.Ref1)) And 
            TRIM(UPPER(B.Ref2)) = TRIM(UPPER(A.Ref2)) And
            TRIM(UPPER(B.Ref3)) = TRIM(UPPER(A.Ref3))
),
-- Get partial matches 1
partial_matches1 as (
    Select 
        distinct A.*, 
        Case 
            When A.Ref1 <> A.Ref1 Or A.Ref1 is NULL
                Then 'No match by Ref 1'
            When A.Ref2 <> A.Ref2 Or A.Ref2 is NULL
                Then 'No match by Ref 2'
            When A.Ref3 <> A.Ref3 Or A.Ref3 is NULL
                Then 'No match by Ref 3'
        End as "Match_Type"
    from (
                Select Distinct
                        Id, A.Ref1, B.Ref1, A.Ref2, B.Ref2, A.Ref3, B.Ref3
                from A
                Join B
                    On  TRIM(UPPER(B.Ref1)) = TRIM(UPPER(A.Ref1)) And 
                        TRIM(UPPER(B.Ref2)) = TRIM(UPPER(A.Ref2)) 

                Union

                Select Distinct
                        Id, A.Ref1, B.Ref1, A.Ref2, B.Ref2, A.Ref3, B.Ref3
                from A
                Join B
                    On  TRIM(UPPER(B.Ref1)) = TRIM(UPPER(A.Ref1)) And 
                        TRIM(UPPER(B.Ref2)) = TRIM(UPPER(A.Ref2))

                Union

                Select Distinct
                        Id, A.Ref1, B.Ref1, A.Ref2, B.Ref2, A.Ref3, B.Ref3
                from A
                Join B
                    On  TRIM(UPPER(B.Ref1)) = TRIM(UPPER(A.Ref1)) And
                        TRIM(UPPER(B.Ref2)) = TRIM(UPPER(A.Ref2))
    ) As A
        Left Join exact_matches As B
            On B.Id = A.Id
        Where B.Id is NULL
),
-- Get partial matches 2
partial_matches2 as
(
    Select 
        distinct A.*, 

        Case 
            When A.Ref1 = A.Ref1
                Then 'Match by Ref1'
            When A.Ref2 = A.Ref2
                Then 'Match by Ref2'
            When A.Ref3 = A.Ref3
                Then 'Match by Ref3'
        End as "Match_Type"
    from (
                Select Distinct
                        Id, A.Ref1, B.Ref1, A.Ref2, B.Ref2, A.Ref3, B.Ref3
                from A
                Join B
                    On  TRIM(UPPER(B.Ref1)) = TRIM(UPPER(A.Ref1))

                Union

                Select Distinct
                        Id, A.Ref1, B.Ref1, A.Ref2, B.Ref2, A.Ref3, B.Ref3
                from A
                Join B
                    On  TRIM(UPPER(B.Ref2)) = TRIM(UPPER(A.Ref2))

                Union

                Select Distinct
                        Id, A.Ref1, B.Ref1, A.Ref2, B.Ref2, A.Ref3, B.Ref3
                from A
                Join B
                    On  TRIM(UPPER(B.Ref3)) = TRIM(UPPER(A.Ref3))
    ) As A
        Left Join exact_matches As B
            On B.Id = A.Id
        Left Join partial_matches1 As C
            On C.Id = A.Id
        Where B.Id is NULL and C.Id is NULL

)

-- Main Query
select 
    Distinct
    A.Id, B.Ref1, B.Ref2, B.Ref3,
    (
        Case
            When B.Match_Type Is Not NULL
                Then B.Match_Type
            Else
                'No Match'
        End
    ) As 'Match_Type'
    from  A

        /* Left Join on Matched Entries */
        Left Join (
            select * from exact_matches
            Union
            select * from partial_matches1
            Union
            select * from partial_matches2
        ) B
            On B.Id = A.Id

Order By A.Id

Upvotes: 0

Views: 4265

Answers (2)

Thorsten Kettner
Thorsten Kettner

Reputation: 94884

You can cross join the two tables and then count how many columns match:

select
  id, a_ref1, a_ref2, a_ref3, b_ref1, b_ref2, b_ref3,
  case match_count 
    when 3 then 'Exact Match'
    when 2 then 'Partial Match 1'
    when 1 then 'Partial Match 2'
    else 'No Match'
  end as match_type
from
(
  select 
     a.id, 
     a.ref1 as a_ref1, a.ref2 as a_ref2, a.ref3 as a_ref3,
     b.ref1 as b_ref1, b.ref2 as b_ref2, b.ref3 as b_ref3,
     case when trim(upper(b.ref1)) = trim(upper(a.ref1)) then 1 else 0 end +
     case when trim(upper(b.ref2)) = trim(upper(a.ref2)) then 1 else 0 end +
     case when trim(upper(b.ref3)) = trim(upper(a.ref3)) then 1 else 0 end
       as match_count
  from a cross join b
) match_counted
-- where match_count = 3 /* only Exact Matches */
-- where match_count = 2 /* only Partial Matches Type 1 */
-- where match_count = 1 /* only Partial Matches Type 2 */
-- where match_count = 0 /* only Non-Matches */
;

You can use any of the suggested WHERE clauses in order to only get a particular match type. Or a variation if you want to pick several, e.g. where match_count >0 or where match_count in (1,2).

If you want to exclude non-matches, you can inner join the tables instead on trim(upper(b.ref1)) = trim(upper(a.ref1)) or trim(upper(b.ref2)) = trim(upper(a.ref2)) or trim(upper(b.ref3)) = trim(upper(a.ref3)). This may make the query faster (but is not guaranteed to do so). It will, however, make the query a tad more prone to errors and slightly lower its maintainability, because you'd state the conditions twice.

UPDATE

It turns out that you had problems describing the task correctly. It also turns out that The Impaler is a kind of mind reader :-) What you want is to join the best matching type to an A row. So if for an A row we find an "Exact Match" in B (all three conditions match), then we join that row. If we don't, then we look for a "Partial Match 1" (i.e. rows where only two conditions match) and join these rows. If there are no such rows, then we join the "Partial Match 2" rows, if such exist.

What I would do here is start with the same join, then rank the joined rows, then keep the best ranked rows.

with match_counted as
(
  select 
     a.id, 
     a.ref1 as a_ref1, a.ref2 as a_ref2, a.ref3 as a_ref3,
     b.ref1 as b_ref1, b.ref2 as b_ref2, b.ref3 as b_ref3,
     case when trim(upper(b.ref1)) = trim(upper(a.ref1)) then 1 else 0 end +
     case when trim(upper(b.ref2)) = trim(upper(a.ref2)) then 1 else 0 end +
     case when trim(upper(b.ref3)) = trim(upper(a.ref3)) then 1 else 0 end
       as match_count
  from a cross join b
)
, ranked as
(
  select
    match_counted.*,
    rank() over(partition by id order by match_count desc) as rnk
  from match_counted
  where match_count > 0
)
select
  id, a_ref1, a_ref2, a_ref3, b_ref1, b_ref2, b_ref3,
  case match_count
    when 3 then 'Exact Match'
    when 2 then 'Partial Match 1'
    when 1 then 'Partial Match 2'
    else 'No Match'
  end as match_type
from ranked
where rnk = 1
order by a.id; 

Upvotes: 1

O. Jones
O. Jones

Reputation: 108651

You can use CASE WHEN ... THEN in your SELECT clause to implement your match rules.

SELECT * 
FROM (
  Select Distinct A.Id, A.Ref1, A.Ref2, A.Ref3, B.Ref1, B.Ref2, B.Ref3,
         CASE 
           WHEN A.Ref1 = B.Ref1 AND A.Ref2 = B.Ref2 AND A.Ref3 = B.Ref3
           THEN 'Exact Match'

           WHEN A.Ref1 = B.Ref1 AND A.Ref2 = B.Ref2
           THEN 'Partial Match 1'
           WHEN A.Ref2 = B.Ref2 AND A.Ref3 = B.Ref3
           THEN 'Partial Match 1'
           WHEN A.Ref1 = B.Ref1 AND A.Ref3 = B.Ref3
           THEN 'Partial Match 1'

           WHEN (A.Ref1 = B.Ref1 OR A.Ref2 = B.Ref2 OR A.Ref3 = B.Ref3)
           THEN 'Partial Match 2'

           ELSE NULL
         END Match
    from A
    Join B
        On  TRIM(UPPER(B.Ref1)) = TRIM(UPPER(A.Ref1)) And 
            TRIM(UPPER(B.Ref2)) = TRIM(UPPER(A.Ref2)) And
            TRIM(UPPER(B.Ref3)) = TRIM(UPPER(A.Ref3))
  ) subquery
WHERE Match = 'Partial Match 1'

This gets the desired items from your resultset, omitting the ones you don't want.

SQL is all about sets and subqueries.

It's a bit verbose, but so are your matching rules.

Upvotes: 0

Related Questions