Andrew
Andrew

Reputation: 49

T-SQL match two tables based on matching criteria/rules

I'm trying to find an efficient way of doing this matching process. I'm not sure if set based updates are the way to go or if a cursor should be used. I'd like to have the matching rules in a separate table as shown below (so that they don't have to be hard-coded in the stored procedure). This is in SQL Server 2014.

I have two tables I need to match based on a set of matching rules/criteria (in a third table). I'm trying to find an efficient method to accomplish this.

Here's the situation: we have payments that need to match to bookings - in order to do this we'll have a set of matching rules that we can go by. I'll need to match the tables based on those rules and update the data in the payment table with the match_code (the rule that was used to update the data).

Payments table:

╔═══════════╦══════════╦══════════════╦════════════╦═══════╦════════════╗
║ FirstName ║ LastName ║ Confirmation ║    Date    ║ Trans ║ Match_code ║
╠═══════════╬══════════╬══════════════╬════════════╬═══════╬════════════╣
║ Scott     ║ Bloom    ║       123456 ║ 2016-01-15 ║       ║            ║
║ Beverly   ║ Smith    ║        65487 ║ 2016-08-16 ║       ║            ║
║ Cindy     ║ Plum     ║       147852 ║ 2016-07-19 ║       ║            ║
╚═══════════╩══════════╩══════════════╩════════════╩═══════╩════════════╝

Bookings table:

╔═══════════╦══════════╦══════════════╦════════════╦═════════════╗
║ FirstName ║ LastName ║ Confirmation ║    Date    ║ Transaction ║
╠═══════════╬══════════╬══════════════╬════════════╬═════════════╣
║ Alfred    ║ Kim      ║       987456 ║ 2016-11-17 ║       12345 ║
║ Beverly   ║ Smith    ║        65487 ║ 2016-07-14 ║       12346 ║
║ Cindy     ║ Plum     ║        99898 ║ 2016-07-19 ║       12347 ║
╚═══════════╩══════════╩══════════════╩════════════╩═════════════╝

Rules table:

╔════════════╦═══════════╦══════════╦══════════════╦══════╗
║ Match_code ║ FirstName ║ LastName ║ Confirmation ║ Date ║
╠════════════╬═══════════╬══════════╬══════════════╬══════╣
║          1 ║         1 ║        1 ║            1 ║    1 ║
║          2 ║         1 ║        1 ║            1 ║    0 ║
║          3 ║         1 ║        1 ║            0 ║    1 ║
╚════════════╩═══════════╩══════════╩══════════════╩══════╝

The rules table identifies which fields are required to be considered a match for each Match_code

The matching procedure should run through and try to match the payments to bookings based on the match rules.

The procedure should update the payments.trans field and the payments.match_code field based on the match it found in the bookings table. The result would be this:

Payments table:

╔═══════════╦══════════╦══════════════╦════════════╦═══════╦════════════╗
║ FirstName ║ LastName ║ Confirmation ║    Date    ║ Trans ║ Match_code ║
╠═══════════╬══════════╬══════════════╬════════════╬═══════╬════════════╣
║ Scott     ║ Bloom    ║       123456 ║ 2016-01-15 ║       ║          0 ║
║ Beverly   ║ Smith    ║        65487 ║ 2016-08-16 ║ 12346 ║          2 ║
║ Cindy     ║ Plum     ║       147852 ║ 2016-07-19 ║ 12347 ║          3 ║
╚═══════════╩══════════╩══════════════╩════════════╩═══════╩════════════╝

The result is that we now know for each payment which booking it matched to and which match_code it matched on.

If you have any recommendations for the best way to accomplish a task like this it would be greatly appreciated - thanks in advance!

Upvotes: 4

Views: 531

Answers (4)

HABO
HABO

Reputation: 15816

Updated with loopless alternative!

This answers the question, but that doesn't mean it's a good idea.

Note that it should be wrapped in a suitable transaction to ensure that updates to the Payments and Bookings tables don't cause any confusion while this mess is running.

It loops through the rules, in order, trying to find matches and updating Payments accordingly. It can be modified to perform the update using dynamic SQL so that the query optimizer can have a shot at improving the performance.

-- Sample data.
declare @Payments as Table ( FirstName VarChar(10), LastName VarChar(10), PaymentDate Date, Confirmation Int, TransactionId Int, MatchCode Int );
insert into @Payments ( FirstName, LastName, Confirmation, PaymentDate, TransactionId, MatchCode ) values
  ( 'Scott', 'Bloom', 123456, '20160115', NULL, NULL ),
  ( 'Beverly', 'Smith', 65487, '20160816', NULL, NULL ),
  ( 'Cindy', 'Plum', 147852, '20160719', NULL, NULL );
select * from @Payments;

declare @Bookings as Table ( FirstName VarChar(10), LastName VarChar(10), PaymentDate Date, Confirmation Int, TransactionId Int );
insert into @Bookings ( FirstName, LastName, Confirmation, PaymentDate, TransactionId ) values
  ( 'Alfred', 'Kim', 987456, '20161117', 12345 ),
  ( 'Beverly', 'Smith', 65487, '20160714', 12346 ),
  ( 'Cindy', 'Plum', 99898, '20160719', 12347 );
select * from @Bookings;

declare @Rules as Table ( MatchCode Int, FirstName Bit, LastName Bit, Confirmation Bit, PaymentDate Bit );
insert into @Rules ( MatchCode, FirstName, LastName, Confirmation, PaymentDate ) values
  ( 1, 1, 1, 1, 1 ),
  ( 2, 1, 1, 1, 0 ),
  ( 3, 1, 1, 0, 1 );
select * from @Rules;

-- Process the data.
declare @MatchCode Int, @FirstName Bit, @LastName Bit, @Confirmation Bit, @PaymentDate Bit;
declare MatchPattern Cursor for
  select MatchCode, FirstName, LastName, Confirmation, PaymentDate
    from @Rules
    order by MatchCode;
open MatchPattern;
fetch next from MatchPattern into @MatchCode, @FirstName, @LastName, @Confirmation, @PaymentDate;
-- Apply the matching rules in order.
while @@Fetch_Status = 0
  begin
  update @Payments
    set MatchCode = @MatchCode, TransactionId = B.TransactionId
    from @Payments as P inner join
      @Bookings as B on
        ( B.FirstName = P.FirstName or @FirstName = 0 ) and
        ( B.LastName = P.LastName or @LastName = 0 ) and
        ( B.Confirmation = P.Confirmation or @Confirmation = 0 ) and
        ( B.PaymentDate = P.PaymentDate or @PaymentDate = 0 )
    where P.MatchCode is NULL -- Skip any rows which have already been matched.
  fetch next from MatchPattern into @MatchCode, @FirstName, @LastName, @Confirmation, @PaymentDate;
  end;
close MatchPattern;
deallocate MatchPattern;

-- Handle any   Payments   that were not matched.
update @Payments
  set MatchCode = 0
  where MatchCode is NULL;

-- Display the result.
select * from @Payments;

Alternatively, if you think a cross join will improve performance because All Cursors Are Evil℠. This is the SQL Super Collider approach: try smashing Payments against Bookings and seeing if the results happen to match any of the Rules.

-- Sample data.
declare @Payments as Table ( FirstName VarChar(10), LastName VarChar(10), PaymentDate Date, Confirmation Int, TransactionId Int, MatchCode Int );
insert into @Payments ( FirstName, LastName, Confirmation, PaymentDate, TransactionId, MatchCode ) values
  ( 'Scott', 'Bloom', 123456, '20160115', NULL, NULL ),
  ( 'Beverly', 'Smith', 65487, '20160816', NULL, NULL ),
  ( 'Cindy', 'Plum', 147852, '20160719', NULL, NULL );
select * from @Payments;

declare @Bookings as Table ( FirstName VarChar(10), LastName VarChar(10), PaymentDate Date, Confirmation Int, TransactionId Int );
insert into @Bookings ( FirstName, LastName, Confirmation, PaymentDate, TransactionId ) values
  ( 'Alfred', 'Kim', 987456, '20161117', 12345 ),
  ( 'Beverly', 'Smith', 65487, '20160714', 12346 ),
  ( 'Cindy', 'Plum', 99898, '20160719', 12347 );
select * from @Bookings;

declare @Rules as Table ( MatchCode Int, FirstName Bit, LastName Bit, Confirmation Bit, PaymentDate Bit );
insert into @Rules ( MatchCode, FirstName, LastName, Confirmation, PaymentDate ) values
  ( 1, 1, 1, 1, 1 ),
  ( 2, 1, 1, 1, 0 ),
  ( 3, 1, 1, 0, 1 );
select * from @Rules;

-- Process the data.
declare @True as Bit = 1, @False as Bit = 0;
-- The following is based on the assumption that   Confirmation   uniquely identifies   Payments .
update @Payments
  set TransactionId = NewTransactionId, MatchCode = NewMatchCode from
  @Payments as P inner join (
    select P.Confirmation, Min( B.TransactionId ) as NewTransactionId, Coalesce( Min( R.MatchCode ), 0 ) as NewMatchCode
      from @Payments as P cross join
        @Bookings as B left outer join
        @Rules as R on
          R.FirstName = case when P.FirstName = B.FirstName then @True else @False end and
          R.LastName = case when P.LastName = B.LastName then @True else @False end and
          R.Confirmation = case when P.Confirmation = B.Confirmation then @True else @False end and
          R.PaymentDate = case when P.PaymentDate = B.PaymentDate then @True else @False end
      where P.MatchCode is NULL
      group by P.Confirmation ) as C on C.Confirmation = P.Confirmation;

-- Display the result.
select * from @Payments;

Upvotes: 0

JJ32
JJ32

Reputation: 1034

I agree with Beth that this would be much simpler with a 2-column table (matchID, fieldname).

There are a number of different approaches to this. One possibility would be to do something like the following, assuming that a primary key of some sort (id) exists in the payments and bookings table:

select payments.id, rules.match_Code, 
SUM(CASE WHEN payments.Firstname = bookings.FirstName 
       and rules.fieldname = 'Firstname' THEN 1 ELSE 0 END +
    CASE WHEN payments.lastname = bookings.lastname 
       and rules.fieldname = 'Lastname' THEN 1 ELSE 0 END +
    CASE WHEN payments.confirmation = bookings.confirmation 
       and rules.fieldname = 'confirmation' THEN 1 ELSE 0 END) AS Matches 
from bookings, payments, rules
group by payments.id, bookings.id, rules.match_Code

From this you could compare the total number of matches to the total number of rules for any particular match code:

SELECT Match_Code, Count(*) AS Rules 
FROM Rules
GROUP BY Match_Code

If the number of matches equals the count of rules, you know you have a match for that code. Then it's simply a matter of picking the lowest level match code for anyone one payment id:

FirstName   LastName    match_Code  Matches
Beverly     Smith       1           3
Beverly     Smith       2           3* 
Beverly     Smith       3           2
Cindy       Plum        1           3
Cindy       Plum        2           2
Cindy       Plum        3           3* 

I know this isn't truly dynamic and I don't know about performance issues, but it's pretty simple to add fields to the CASE WHEN statement as needed and still make changes to the rules definitions on demand.

Upvotes: 0

Juan Carlos Oropeza
Juan Carlos Oropeza

Reputation: 48187

Sql Fiddle Demo

First you need to LEFT JOIN using OR, and use CASE to determinate where the match occurs.

SELECT P.[FirstName], P.[LastName], P.[Confirmation], P.[Date], B.[Transaction],
       CASE WHEN P.[FirstName]     = B.[FirstName]    THEN 1 ELSE 0 END as match_1,
       CASE WHEN P.[LastName]      = B.[LastName]     THEN 1 ELSE 0 END as match_2,
       CASE WHEN P.[Confirmation]  = B.[Confirmation] THEN 1 ELSE 0 END as match_3,
       CASE WHEN P.[Date]          = B.[Date]         THEN 1 ELSE 0 END as match_4
FROM Payments P
LEFT JOIN Bookings B
  ON P.[FirstName]     = B.[FirstName]
  OR P.[LastName]      = B.[LastName]
  OR P.[Confirmation]  = B.[Confirmation]
  OR P.[Date]          = B.[Date]  

Partial Output: P.* + B.[Transaction] + match.*

enter image description here

Then you use that query to check what rules follow

WITH cte as (
    SELECT P.[FirstName], P.[LastName], P.[Confirmation], P.[Date], B.[Transaction],
           CASE WHEN P.[FirstName]     = B.[FirstName]    THEN 1 ELSE 0 END as match_1,
           CASE WHEN P.[LastName]      = B.[LastName]     THEN 1 ELSE 0 END as match_2,
           CASE WHEN P.[Confirmation]  = B.[Confirmation] THEN 1 ELSE 0 END as match_3,
           CASE WHEN P.[Date]          = B.[Date]         THEN 1 ELSE 0 END as match_4
    FROM Payments P
    LEFT JOIN Bookings B
      ON P.[FirstName]     = B.[FirstName]
      OR P.[LastName]      = B.[LastName]
      OR P.[Confirmation]  = B.[Confirmation]
      OR P.[Date]          = B.[Date]  
)
SELECT C.[FirstName], C.[LastName], C.[Confirmation], C.[Date], C.[Transaction], 
      COALESCE([Match_code], 0 ) as [Match_code]
FROM cte C
LEFT JOIN Rules R
   ON C.match_1 = R.[FirstName]
  AND C.match_2 = R.[LastName]
  AND C.match_3 = R.[Confirmation]
  AND C.match_4 = R.[Date]

Final Output

enter image description here

Final Note

Is possible your payment match one or more booking, also is possible match different rules. So you probably need use the last result as a subquery to select the lowest [Match_code]

Upvotes: 0

Beth
Beth

Reputation: 9607

Assuming your rules are static and never change (I know this is not the case,) I'd do something like:

WITH p (paymentID, rule1Key, rule2Key, rule3Key) AS 
(SELECT
paymentID, 
firstname + '|' + lastname + '|' + confirmation + '|' + [DATE] AS rule1Key,
firstname + '|' + lastname + '|' + confirmation  AS rule2Key,
firstname + '|' + lastname + '|' + [DATE] AS rule3Key
FROM
payment),

b (TRANSACTION, rule1Key, rule2Key, rule3Key) AS
(SELECT
[TRANSACTION], 
firstname + '|' + lastname + '|' + confirmation + '|' + [DATE] AS rule1Key,
firstname + '|' + lastname + '|' + confirmation  AS rule2Key,
firstname + '|' + lastname + '|' + [DATE] AS rule3Key
FROM
bookings)

UPDATE p
SET trans = 
    CASE 
        WHEN b1 IS NOT NULL THEN    b1.TRANSACTION
        WHEN b2 IS NOT NULL THEN    b2.TRANSACTION
        WHEN b3 IS NOT NULL THEN    b3.TRANSACTION
    end,
SET match_code = 
    CASE 
        WHEN b1 IS NOT NULL THEN    1
        WHEN b2 IS NOT NULL THEN    2
        WHEN b3 IS NOT NULL THEN    3
    end
FROM
p LEFT OUTER JOIN
b b1 ON
p.rule1key = b1.rule1key LEFT OUTER JOIN
b b2 ON
p.rule2key = b2.rule2key LEFT OUTER JOIN
b b3 ON
p.rule3key = b3.rule3key;

Understanding, however, your evil users want to play with the rules (they won't like their current restrictions, they'll be asking for <, >, and OR conditions later, mark my words! When they ask for () you revolt!

then you need them to structure the table differently for you so you can use dynamic sql to build your query from the contents of the rules table. They could have a 2-column table with the matchID and fieldName and assume they're always connected by AND conditions, so your rules would look like:

matchID   fieldName
1         FirstName 
1         LastName 
1         Confirmation 
1         DATE
2         FirstName 
2         LastName 
2         Confirmation 
3         FirstName 
3         LastName 
3         DATE

Hopefully you can see how to get from those rule contents to a query you can generate to do your update.

Upvotes: 1

Related Questions