Reputation: 49
I'm trying to find an efficient way of doing this matching process. I'm not sure if set based updates are the way to go or if a cursor should be used. I'd like to have the matching rules in a separate table as shown below (so that they don't have to be hard-coded in the stored procedure). This is in SQL Server 2014.
I have two tables I need to match based on a set of matching rules/criteria (in a third table). I'm trying to find an efficient method to accomplish this.
Here's the situation: we have payments that need to match to bookings - in order to do this we'll have a set of matching rules that we can go by. I'll need to match the tables based on those rules and update the data in the payment table with the match_code (the rule that was used to update the data).
Payments table:
╔═══════════╦══════════╦══════════════╦════════════╦═══════╦════════════╗
║ FirstName ║ LastName ║ Confirmation ║ Date ║ Trans ║ Match_code ║
╠═══════════╬══════════╬══════════════╬════════════╬═══════╬════════════╣
║ Scott ║ Bloom ║ 123456 ║ 2016-01-15 ║ ║ ║
║ Beverly ║ Smith ║ 65487 ║ 2016-08-16 ║ ║ ║
║ Cindy ║ Plum ║ 147852 ║ 2016-07-19 ║ ║ ║
╚═══════════╩══════════╩══════════════╩════════════╩═══════╩════════════╝
Bookings table:
╔═══════════╦══════════╦══════════════╦════════════╦═════════════╗
║ FirstName ║ LastName ║ Confirmation ║ Date ║ Transaction ║
╠═══════════╬══════════╬══════════════╬════════════╬═════════════╣
║ Alfred ║ Kim ║ 987456 ║ 2016-11-17 ║ 12345 ║
║ Beverly ║ Smith ║ 65487 ║ 2016-07-14 ║ 12346 ║
║ Cindy ║ Plum ║ 99898 ║ 2016-07-19 ║ 12347 ║
╚═══════════╩══════════╩══════════════╩════════════╩═════════════╝
Rules table:
╔════════════╦═══════════╦══════════╦══════════════╦══════╗
║ Match_code ║ FirstName ║ LastName ║ Confirmation ║ Date ║
╠════════════╬═══════════╬══════════╬══════════════╬══════╣
║ 1 ║ 1 ║ 1 ║ 1 ║ 1 ║
║ 2 ║ 1 ║ 1 ║ 1 ║ 0 ║
║ 3 ║ 1 ║ 1 ║ 0 ║ 1 ║
╚════════════╩═══════════╩══════════╩══════════════╩══════╝
The rules table identifies which fields are required to be considered a match for each Match_code
The matching procedure should run through and try to match the payments to bookings based on the match rules.
The procedure should update the payments.trans field and the payments.match_code field based on the match it found in the bookings table. The result would be this:
Payments table:
╔═══════════╦══════════╦══════════════╦════════════╦═══════╦════════════╗
║ FirstName ║ LastName ║ Confirmation ║ Date ║ Trans ║ Match_code ║
╠═══════════╬══════════╬══════════════╬════════════╬═══════╬════════════╣
║ Scott ║ Bloom ║ 123456 ║ 2016-01-15 ║ ║ 0 ║
║ Beverly ║ Smith ║ 65487 ║ 2016-08-16 ║ 12346 ║ 2 ║
║ Cindy ║ Plum ║ 147852 ║ 2016-07-19 ║ 12347 ║ 3 ║
╚═══════════╩══════════╩══════════════╩════════════╩═══════╩════════════╝
The result is that we now know for each payment which booking it matched to and which match_code it matched on.
If you have any recommendations for the best way to accomplish a task like this it would be greatly appreciated - thanks in advance!
Upvotes: 4
Views: 531
Reputation: 15816
Updated with loopless alternative!
This answers the question, but that doesn't mean it's a good idea.
Note that it should be wrapped in a suitable transaction to ensure that updates to the Payments
and Bookings
tables don't cause any confusion while this mess is running.
It loops through the rules, in order, trying to find matches and updating Payments
accordingly. It can be modified to perform the update
using dynamic SQL so that the query optimizer can have a shot at improving the performance.
-- Sample data.
declare @Payments as Table ( FirstName VarChar(10), LastName VarChar(10), PaymentDate Date, Confirmation Int, TransactionId Int, MatchCode Int );
insert into @Payments ( FirstName, LastName, Confirmation, PaymentDate, TransactionId, MatchCode ) values
( 'Scott', 'Bloom', 123456, '20160115', NULL, NULL ),
( 'Beverly', 'Smith', 65487, '20160816', NULL, NULL ),
( 'Cindy', 'Plum', 147852, '20160719', NULL, NULL );
select * from @Payments;
declare @Bookings as Table ( FirstName VarChar(10), LastName VarChar(10), PaymentDate Date, Confirmation Int, TransactionId Int );
insert into @Bookings ( FirstName, LastName, Confirmation, PaymentDate, TransactionId ) values
( 'Alfred', 'Kim', 987456, '20161117', 12345 ),
( 'Beverly', 'Smith', 65487, '20160714', 12346 ),
( 'Cindy', 'Plum', 99898, '20160719', 12347 );
select * from @Bookings;
declare @Rules as Table ( MatchCode Int, FirstName Bit, LastName Bit, Confirmation Bit, PaymentDate Bit );
insert into @Rules ( MatchCode, FirstName, LastName, Confirmation, PaymentDate ) values
( 1, 1, 1, 1, 1 ),
( 2, 1, 1, 1, 0 ),
( 3, 1, 1, 0, 1 );
select * from @Rules;
-- Process the data.
declare @MatchCode Int, @FirstName Bit, @LastName Bit, @Confirmation Bit, @PaymentDate Bit;
declare MatchPattern Cursor for
select MatchCode, FirstName, LastName, Confirmation, PaymentDate
from @Rules
order by MatchCode;
open MatchPattern;
fetch next from MatchPattern into @MatchCode, @FirstName, @LastName, @Confirmation, @PaymentDate;
-- Apply the matching rules in order.
while @@Fetch_Status = 0
begin
update @Payments
set MatchCode = @MatchCode, TransactionId = B.TransactionId
from @Payments as P inner join
@Bookings as B on
( B.FirstName = P.FirstName or @FirstName = 0 ) and
( B.LastName = P.LastName or @LastName = 0 ) and
( B.Confirmation = P.Confirmation or @Confirmation = 0 ) and
( B.PaymentDate = P.PaymentDate or @PaymentDate = 0 )
where P.MatchCode is NULL -- Skip any rows which have already been matched.
fetch next from MatchPattern into @MatchCode, @FirstName, @LastName, @Confirmation, @PaymentDate;
end;
close MatchPattern;
deallocate MatchPattern;
-- Handle any Payments that were not matched.
update @Payments
set MatchCode = 0
where MatchCode is NULL;
-- Display the result.
select * from @Payments;
Alternatively, if you think a cross join
will improve performance because All Cursors Are Evil℠. This is the SQL Super Collider approach: try smashing Payments
against Bookings
and seeing if the results happen to match any of the Rules
.
-- Sample data.
declare @Payments as Table ( FirstName VarChar(10), LastName VarChar(10), PaymentDate Date, Confirmation Int, TransactionId Int, MatchCode Int );
insert into @Payments ( FirstName, LastName, Confirmation, PaymentDate, TransactionId, MatchCode ) values
( 'Scott', 'Bloom', 123456, '20160115', NULL, NULL ),
( 'Beverly', 'Smith', 65487, '20160816', NULL, NULL ),
( 'Cindy', 'Plum', 147852, '20160719', NULL, NULL );
select * from @Payments;
declare @Bookings as Table ( FirstName VarChar(10), LastName VarChar(10), PaymentDate Date, Confirmation Int, TransactionId Int );
insert into @Bookings ( FirstName, LastName, Confirmation, PaymentDate, TransactionId ) values
( 'Alfred', 'Kim', 987456, '20161117', 12345 ),
( 'Beverly', 'Smith', 65487, '20160714', 12346 ),
( 'Cindy', 'Plum', 99898, '20160719', 12347 );
select * from @Bookings;
declare @Rules as Table ( MatchCode Int, FirstName Bit, LastName Bit, Confirmation Bit, PaymentDate Bit );
insert into @Rules ( MatchCode, FirstName, LastName, Confirmation, PaymentDate ) values
( 1, 1, 1, 1, 1 ),
( 2, 1, 1, 1, 0 ),
( 3, 1, 1, 0, 1 );
select * from @Rules;
-- Process the data.
declare @True as Bit = 1, @False as Bit = 0;
-- The following is based on the assumption that Confirmation uniquely identifies Payments .
update @Payments
set TransactionId = NewTransactionId, MatchCode = NewMatchCode from
@Payments as P inner join (
select P.Confirmation, Min( B.TransactionId ) as NewTransactionId, Coalesce( Min( R.MatchCode ), 0 ) as NewMatchCode
from @Payments as P cross join
@Bookings as B left outer join
@Rules as R on
R.FirstName = case when P.FirstName = B.FirstName then @True else @False end and
R.LastName = case when P.LastName = B.LastName then @True else @False end and
R.Confirmation = case when P.Confirmation = B.Confirmation then @True else @False end and
R.PaymentDate = case when P.PaymentDate = B.PaymentDate then @True else @False end
where P.MatchCode is NULL
group by P.Confirmation ) as C on C.Confirmation = P.Confirmation;
-- Display the result.
select * from @Payments;
Upvotes: 0
Reputation: 1034
I agree with Beth that this would be much simpler with a 2-column table (matchID, fieldname).
There are a number of different approaches to this. One possibility would be to do something like the following, assuming that a primary key of some sort (id) exists in the payments and bookings table:
select payments.id, rules.match_Code,
SUM(CASE WHEN payments.Firstname = bookings.FirstName
and rules.fieldname = 'Firstname' THEN 1 ELSE 0 END +
CASE WHEN payments.lastname = bookings.lastname
and rules.fieldname = 'Lastname' THEN 1 ELSE 0 END +
CASE WHEN payments.confirmation = bookings.confirmation
and rules.fieldname = 'confirmation' THEN 1 ELSE 0 END) AS Matches
from bookings, payments, rules
group by payments.id, bookings.id, rules.match_Code
From this you could compare the total number of matches to the total number of rules for any particular match code:
SELECT Match_Code, Count(*) AS Rules
FROM Rules
GROUP BY Match_Code
If the number of matches equals the count of rules, you know you have a match for that code. Then it's simply a matter of picking the lowest level match code for anyone one payment id:
FirstName LastName match_Code Matches
Beverly Smith 1 3
Beverly Smith 2 3*
Beverly Smith 3 2
Cindy Plum 1 3
Cindy Plum 2 2
Cindy Plum 3 3*
I know this isn't truly dynamic and I don't know about performance issues, but it's pretty simple to add fields to the CASE WHEN statement as needed and still make changes to the rules definitions on demand.
Upvotes: 0
Reputation: 48187
First you need to LEFT JOIN
using OR
, and use CASE
to determinate where the match occurs.
SELECT P.[FirstName], P.[LastName], P.[Confirmation], P.[Date], B.[Transaction],
CASE WHEN P.[FirstName] = B.[FirstName] THEN 1 ELSE 0 END as match_1,
CASE WHEN P.[LastName] = B.[LastName] THEN 1 ELSE 0 END as match_2,
CASE WHEN P.[Confirmation] = B.[Confirmation] THEN 1 ELSE 0 END as match_3,
CASE WHEN P.[Date] = B.[Date] THEN 1 ELSE 0 END as match_4
FROM Payments P
LEFT JOIN Bookings B
ON P.[FirstName] = B.[FirstName]
OR P.[LastName] = B.[LastName]
OR P.[Confirmation] = B.[Confirmation]
OR P.[Date] = B.[Date]
Partial Output: P.* + B.[Transaction] + match.*
Then you use that query to check what rules follow
WITH cte as (
SELECT P.[FirstName], P.[LastName], P.[Confirmation], P.[Date], B.[Transaction],
CASE WHEN P.[FirstName] = B.[FirstName] THEN 1 ELSE 0 END as match_1,
CASE WHEN P.[LastName] = B.[LastName] THEN 1 ELSE 0 END as match_2,
CASE WHEN P.[Confirmation] = B.[Confirmation] THEN 1 ELSE 0 END as match_3,
CASE WHEN P.[Date] = B.[Date] THEN 1 ELSE 0 END as match_4
FROM Payments P
LEFT JOIN Bookings B
ON P.[FirstName] = B.[FirstName]
OR P.[LastName] = B.[LastName]
OR P.[Confirmation] = B.[Confirmation]
OR P.[Date] = B.[Date]
)
SELECT C.[FirstName], C.[LastName], C.[Confirmation], C.[Date], C.[Transaction],
COALESCE([Match_code], 0 ) as [Match_code]
FROM cte C
LEFT JOIN Rules R
ON C.match_1 = R.[FirstName]
AND C.match_2 = R.[LastName]
AND C.match_3 = R.[Confirmation]
AND C.match_4 = R.[Date]
Final Output
Final Note
Is possible your payment match one or more booking, also is possible match different rules. So you probably need use the last result as a subquery to select the lowest [Match_code]
Upvotes: 0
Reputation: 9607
Assuming your rules are static and never change (I know this is not the case,) I'd do something like:
WITH p (paymentID, rule1Key, rule2Key, rule3Key) AS
(SELECT
paymentID,
firstname + '|' + lastname + '|' + confirmation + '|' + [DATE] AS rule1Key,
firstname + '|' + lastname + '|' + confirmation AS rule2Key,
firstname + '|' + lastname + '|' + [DATE] AS rule3Key
FROM
payment),
b (TRANSACTION, rule1Key, rule2Key, rule3Key) AS
(SELECT
[TRANSACTION],
firstname + '|' + lastname + '|' + confirmation + '|' + [DATE] AS rule1Key,
firstname + '|' + lastname + '|' + confirmation AS rule2Key,
firstname + '|' + lastname + '|' + [DATE] AS rule3Key
FROM
bookings)
UPDATE p
SET trans =
CASE
WHEN b1 IS NOT NULL THEN b1.TRANSACTION
WHEN b2 IS NOT NULL THEN b2.TRANSACTION
WHEN b3 IS NOT NULL THEN b3.TRANSACTION
end,
SET match_code =
CASE
WHEN b1 IS NOT NULL THEN 1
WHEN b2 IS NOT NULL THEN 2
WHEN b3 IS NOT NULL THEN 3
end
FROM
p LEFT OUTER JOIN
b b1 ON
p.rule1key = b1.rule1key LEFT OUTER JOIN
b b2 ON
p.rule2key = b2.rule2key LEFT OUTER JOIN
b b3 ON
p.rule3key = b3.rule3key;
Understanding, however, your evil users want to play with the rules (they won't like their current restrictions, they'll be asking for <, >, and OR conditions later, mark my words! When they ask for () you revolt!
then you need them to structure the table differently for you so you can use dynamic sql to build your query from the contents of the rules table. They could have a 2-column table with the matchID and fieldName and assume they're always connected by AND conditions, so your rules would look like:
matchID fieldName
1 FirstName
1 LastName
1 Confirmation
1 DATE
2 FirstName
2 LastName
2 Confirmation
3 FirstName
3 LastName
3 DATE
Hopefully you can see how to get from those rule contents to a query you can generate to do your update.
Upvotes: 1