Reputation: 10459
I need to find similar records in SQL Server 2016, and it would be great if I could order result by how much they match. Match criteria for similar are 2 detail tables.
My table structures are:
//Basic:
Id Title
1 Title 1
2 Title 2
3 Title 3
4 Title 4
5 Title 5
//Parameters:
Id Title
1 Param 1
2 Param 2
3 Param 3
4 Param 4
5 Param 5
//Values:
Id Value
1 Val 1
2 Val 2
3 Val 3
4 Val 4
5 Val 5
And connection tables:
BasicId ParameterId ValueId
1 1 2
1 3 1
1 4 5
2 1 1
2 2 4
2 3 2
3 1 2
3 3 1
3 4 4
4 1 2
4 4 2
4 5 2
5 1 1
5 2 5
5 3 3
At some point in my program I create new Basic row (in memory) with Parameters and values. Now I need to know:
For first point I do it like this: 2 rows are equal if all parameters and there values matches. For now I calculate string with Ids of all parameters and values and add it to Basic table.
Basic becomes:
Id Title EqualString (this comes from connection table)
1 Title 1 1:2;3:1;4:5
2 Title 2 1:1;2:4;3:2
3 Title 3 1:2;3:1;4:4
4 Title 4 1:2;4:2;5:2
5 Title 5 1:1;2:5;3:3
This works great, since order is defined by ParameterId ASC and I have index on EqualString column. At this point I should say, that my basic table have arround 5 million records and growing.
But for second problem, I don't have a clue, how this kind of problem could be solved.
For example:
Input Title X1 1:1;2:4;3:2 100% Title 2
33% Title 5
Input Title X2 1:1;2:4;3:1 66% Title 2
33% Title 5
Input Title X3 1:2;3:1;5:5 66% Title 1
66% Title 3
Input Title X4 1:2;3:1;5:3 66% Title 3
33% Title 5
I use SQL Server 2016 and have complete control of computer on which database is installed.
There are around 100 parameters-values to check if they match (in upper example only 3 - they are separated with ;)
I think full text search is not the answer (or maybe it is), because I need exact match between 2 commas (2:5 for example).
Or I should go with completely different approach without calculating EqualString (on insert witch is rarely).
Use case:
I am building web page where product that sell is dynamic. That means, that user configure it before it buys.
For example: set height, width, colour, choose material ...
Now I need to know if this product already exists (someone else choose exact same parameters (width, height, colour) and values (100, 120, green) or this is completely new product.
Parameters which was configured are in parameters table and values are in values table.
Upvotes: 1
Views: 41
Reputation: 5157
This is copied verbatim from the question:
CREATE TABLE #Connection( BasicId INT, ParameterId INT, ValueId INT )
INSERT INTO #Connection
VALUES
( 1, 1, 2 ), ( 1, 3, 1 ), ( 1, 4, 5 ),
( 2, 1, 1 ), ( 2, 2, 4 ), ( 2, 3, 2 ),
( 3, 1, 2 ), ( 3, 3, 1 ), ( 3, 4, 4 ),
( 4, 1, 2 ), ( 4, 4, 2 ), ( 4, 5, 2 ),
( 5, 1, 1 ), ( 5, 2, 5 ), ( 5, 3, 3 )
This is sample data for new products
CREATE TABLE #NewConnection( BasicId INT, ParameterId INT, ValueId INT )
INSERT INTO #NewConnection
VALUES
( 10, 1, 2 ), ( 10, 2, 2 ), ( 10, 3, 1 ),
( 11, 1, 1 ), ( 11, 2, 4 ), ( 11, 3, 2 ),
( 12, 1, 0 ), ( 12, 2, 5 ), ( 12, 3, 3 )
SELECT NC.BasicID AS NewBasicID, C.BasicID, C.ParameterId, C.ValueId,
CONVERT( DECIMAL( 5, 0 ), COUNT( C.ParameterId ) OVER( PARTITION BY NC.BasicID, C.BasicID )) / ParamCount * 100 AS MatchStrength
FROM
( SELECT *, COUNT( ParameterId ) OVER( PARTITION BY BasicID ) AS ParamCount
FROM #NewConnection ) AS NC
INNER JOIN #Connection AS C
ON NC.ParameterId = C.ParameterId AND C.ValueId = NC.ValueId
ORDER BY NewBasicID, MatchStrength DESC, C.BasicID, C.ParameterId
#NewConnection
- contains "New Connection" data that needs to be checked against existing #Connection
tableParamCount
- counts the number of properties for each "new" BasicID. Needed to calculate MatchStrength
INNER JOIN
- joins "new" connections with existing ones on ParameterIDs and ValueIDsMatchStrength
- counts matched ParameterId
and ValueID
combinations for each NC.BasicID, C.BasicID group, then divides by the ParamCount
to get the percentage matched.References:
Note: performance for large data sets has not been tested but you may need the following indexes: { parameterID, valueID }, { BasicID }
If you want to return distinct matched BasicIDs and MatchStrength (no parameterIDs) then use the following:
SELECT DISTINCT NewBasicID, BasicID, MatchStrength
FROM
( SELECT NC.BasicID AS NewBasicID, C.BasicID, C.ParameterId, C.ValueId,
CONVERT( DECIMAL( 5, 0 ), COUNT( C.ParameterId ) OVER( PARTITION BY NC.BasicID, C.BasicID )) / ParamCount * 100 AS MatchStrength
FROM
( SELECT *, COUNT( ParameterId ) OVER( PARTITION BY BasicID ) AS ParamCount
FROM #NewConnection ) AS NC
INNER JOIN #Connection AS C
ON NC.ParameterId = C.ParameterId AND C.ValueId = NC.ValueId ) AS Matches
ORDER BY NewBasicID, MatchStrength DESC, BasicID
Upvotes: 1