Reputation: 251
I'm trying to match two tables where one of the tables stores multiple values as a string.
In the example below I need to classify each product ordered from the #Orders table with a #NewProduct.NewProductId.
The issue I'm having is sometimes we launch a new product like "Black Shirt", then later we launch an adaption to that product like "Black Shirt Vneck".
I need to match both changes correctly to the #Orders table. So if the order has Black and Shirt, but not Vneck, it's considered a "Black Shirt", but if the order has Black and Shirt and Vneck, it's considered a "Black Vneck Shirt."
The code below is an example - the current logic I'm using returns duplicates with the Left Join. Also, assume we can modify the format of #NewProducts but not #Orders.
IF OBJECT_ID('tempdb.dbo.#NewProducts') IS NOT NULL DROP TABLE #NewProducts
CREATE TABLE #NewProducts
(
ProductType VARCHAR(MAX)
, Attribute_1 VARCHAR(MAX)
, Attribute_2 VARCHAR(MAX)
, NewProductId INT
)
INSERT #NewProducts
VALUES
('shirt', 'black', 'NULL', 1),
('shirt', 'black', 'vneck', 2),
('shirt', 'white', 'NULL', 3)
IF OBJECT_ID('tempdb.dbo.#Orders') IS NOT NULL DROP TABLE #Orders
CREATE TABLE #Orders
(
OrderId INT
, ProductType VARCHAR(MAX)
, Attributes VARCHAR(MAX)
)
INSERT #Orders
VALUES
(1, 'shirt', 'black small circleneck'),
(2, 'shirt', 'black large circleneck'),
(3, 'shirt', 'black small vneck'),
(4, 'shirt', 'black small vneck'),
(5, 'shirt', 'white large circleneck'),
(6, 'shirt', 'white small vneck')
SELECT *
FROM #Orders o
LEFT JOIN #NewProducts np
ON o.ProductType = np.ProductType
AND CHARINDEX(np.Attribute_1, o.Attributes) > 0
AND (
CHARINDEX(np.Attribute_2, o.Attributes) > 0
OR np.Attribute_2 = 'NULL'
)
Upvotes: 0
Views: 59
Reputation: 1269633
You seem to want the longest overlap:
SELECT *
FROM #Orders o OUTER APPLY
(SELECT Top (1) np.*
FROM #NewProducts np
WHERE o.ProductType = np.ProductType AND
CHARINDEX(np.Attribute_1, o.Attributes) > 0
ORDER BY ((CASE WHEN CHARINDEX(np.Attribute_1, o.Attributes) > 0 THEN 1 ELSE 0 END) +
(CASE WHEN CHARINDEX(np.Attribute_2, o.Attributes) > 0 THEN 1 ELSE 0 END)
) DESC
) np;
I can't say I'm thrilled with the need to do this. It seems like the Orders
should contain numeric ids that reference the actual product. However, I can see how something like this is sometimes necessary.
Upvotes: 1
Reputation: 2564
I couldn't get Gordon's answer to work, and was part way through my own response when his came in. His idea of taking the biggest overlap helped. I've tweaked your NewProducts
table, so that that side of things is "normalised" even if the Orders
table cannot be. Code below or at rextester.com/ERIF13021
create table #NewProduct
(
NewProductID int primary key,
ProductType varchar(max),
ProductName varchar(max)
)
create table #Attribute
(
AttributeID int primary key,
AttributeName varchar(max)
)
create table #ProductAttribute
(
NewProductID int,
AttributeID int
)
insert into #NewProduct
values (1, 'shirt', 'black shirt'),
(2, 'shirt', 'black vneck shirt'),
(3, 'shirt', 'white shirt')
insert into #Attribute
values (1, 'black'),
(2, 'white'),
(3, 'vneck')
insert into #ProductAttribute
values (1,1),
(2,1),
(2,3),
(3,2)
select top 1 with ties
*
from
(
select
o.OrderId,
p.NewProductID,
p.ProductType,
p.ProductName,
o.Attributes,
sum(case when charindex(a.AttributeName,o.Attributes)>0 then 1 else 0 end) as Matches
from
#Orders o
JOIN #Attribute a ON
charindex(a.AttributeName,o.Attributes)>0
JOIN #ProductAttribute pa ON
a.AttributeID = pa.AttributeID
JOIN #NewProduct p ON
pa.NewProductID = p.NewProductID AND
o.ProductType = p.ProductType
group by
o.OrderId,
p.NewProductID,
p.ProductType,
p.ProductName,
o.Attributes
) o2
order by
row_number() over (partition by o2.OrderID order by o2.Matches desc)
Upvotes: 0