Fubudis
Fubudis

Reputation: 251

SQL Joining on Field with Nulls

I'm trying to match two tables where one of the tables stores multiple values as a string.

In the example below I need to classify each product ordered from the #Orders table with a #NewProduct.NewProductId.

The issue I'm having is sometimes we launch a new product like "Black Shirt", then later we launch an adaption to that product like "Black Shirt Vneck".

I need to match both changes correctly to the #Orders table. So if the order has Black and Shirt, but not Vneck, it's considered a "Black Shirt", but if the order has Black and Shirt and Vneck, it's considered a "Black Vneck Shirt."

The code below is an example - the current logic I'm using returns duplicates with the Left Join. Also, assume we can modify the format of #NewProducts but not #Orders.

IF              OBJECT_ID('tempdb.dbo.#NewProducts') IS NOT NULL DROP TABLE #NewProducts
CREATE TABLE    #NewProducts 
(
      ProductType VARCHAR(MAX)
    , Attribute_1 VARCHAR(MAX)
    , Attribute_2 VARCHAR(MAX)
    , NewProductId INT
)

INSERT      #NewProducts
VALUES
    ('shirt', 'black', 'NULL', 1),
    ('shirt', 'black', 'vneck', 2),
    ('shirt',  'white', 'NULL', 3)


IF              OBJECT_ID('tempdb.dbo.#Orders') IS NOT NULL DROP TABLE #Orders
CREATE TABLE    #Orders
(
      OrderId INT
    , ProductType VARCHAR(MAX)
    , Attributes    VARCHAR(MAX)
)

INSERT  #Orders
VALUES
    (1, 'shirt', 'black small circleneck'),
    (2, 'shirt', 'black large circleneck'),
    (3, 'shirt', 'black small vneck'),
    (4, 'shirt', 'black small vneck'),
    (5, 'shirt', 'white large circleneck'),
    (6, 'shirt', 'white small vneck')

SELECT      *
FROM        #Orders o
        LEFT JOIN #NewProducts np
            ON o.ProductType = np.ProductType
            AND CHARINDEX(np.Attribute_1, o.Attributes) > 0
            AND (
                    CHARINDEX(np.Attribute_2, o.Attributes) > 0
                OR np.Attribute_2 = 'NULL'
                )

Upvotes: 0

Views: 59

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269633

You seem to want the longest overlap:

SELECT *
FROM #Orders o OUTER APPLY
     (SELECT Top (1) np.*
      FROM #NewProducts np
      WHERE o.ProductType = np.ProductType AND
            CHARINDEX(np.Attribute_1, o.Attributes) > 0
      ORDER BY ((CASE WHEN CHARINDEX(np.Attribute_1, o.Attributes) > 0 THEN 1 ELSE 0 END) +
                (CASE WHEN CHARINDEX(np.Attribute_2, o.Attributes) > 0 THEN 1 ELSE 0 END)
               ) DESC
     ) np;

I can't say I'm thrilled with the need to do this. It seems like the Orders should contain numeric ids that reference the actual product. However, I can see how something like this is sometimes necessary.

Upvotes: 1

Steve Lovell
Steve Lovell

Reputation: 2564

I couldn't get Gordon's answer to work, and was part way through my own response when his came in. His idea of taking the biggest overlap helped. I've tweaked your NewProducts table, so that that side of things is "normalised" even if the Orders table cannot be. Code below or at rextester.com/ERIF13021

create table #NewProduct
(
NewProductID int primary key,
ProductType varchar(max),
ProductName varchar(max)
)

create table #Attribute
(
AttributeID int primary key,
AttributeName varchar(max)
)

create table #ProductAttribute
(
NewProductID int,
AttributeID int
)

insert into #NewProduct
values (1, 'shirt', 'black shirt'),
       (2, 'shirt', 'black vneck shirt'),
       (3, 'shirt', 'white shirt')

insert into #Attribute
values (1, 'black'),
       (2, 'white'),
       (3, 'vneck')

insert into #ProductAttribute
values (1,1),
       (2,1),
       (2,3),
       (3,2)


select top 1 with ties
*
from
(
select
    o.OrderId,
    p.NewProductID,
    p.ProductType,
    p.ProductName,
    o.Attributes,
    sum(case when charindex(a.AttributeName,o.Attributes)>0 then 1 else 0 end) as Matches
from
    #Orders o
    JOIN #Attribute a ON 
    charindex(a.AttributeName,o.Attributes)>0
    JOIN #ProductAttribute pa ON
    a.AttributeID = pa.AttributeID
    JOIN #NewProduct p ON
    pa.NewProductID = p.NewProductID AND
    o.ProductType = p.ProductType
group by
    o.OrderId,
    p.NewProductID,
    p.ProductType,
    p.ProductName,
    o.Attributes
) o2
order by
row_number() over (partition by o2.OrderID order by o2.Matches desc)

Upvotes: 0

Related Questions