EvanMPW
EvanMPW

Reputation: 341

SQL Server: Generate unique customer key based on two columns

I am working on cleaning up a customer list from an ecommerce site. The customer list has a many to many relationship between customer ID and customer email. For example, a customer could place an order with the same email while logged in or anonymous, and the result would be two customer records with the same email but different customer IDs. Similarly, a customer could create orders with two different emails while logged in which would result in customer records with the same ID but different emails. Given this, I want to create a list of customers with truly unique IDs based on either email or customer number. In addition, there are situations where the email is blank, so customers records that both have blank emails but different IDs would need to be considered two different customers.

So given something like this:

CUST_ID CUST_EMAIL
------------------------
123     [email protected]
123     [email protected]
124     [email protected]
125     [email protected]
126
127
128     [email protected]
128     [email protected]
129     [email protected]

I would want to generate a key like this:

CUST_ID CUST_EMAIL      NEW_CUST_KEY
------------------------------------
123     [email protected]     1
123     [email protected]     1
124     [email protected]     2
125     [email protected]     2
126                         3
127                         4
128     [email protected]     5
128     [email protected]     5
129     [email protected]     5

Upvotes: 1

Views: 706

Answers (3)

Evaldas Buinauskas
Evaldas Buinauskas

Reputation: 14077

I was trying to map your users with their IDs to their emails and vice-versa and as a result I created this Frankenstein monster query:

DECLARE @Customers TABLE
(
  CUST_ID INT
  , CUST_EMAIL VARCHAR(20)
);

INSERT INTO @Customers (CUST_ID, CUST_EMAIL)
VALUES (123, '[email protected]')
  , (123, '[email protected]')
  , (124, '[email protected]')
  , (125, '[email protected]')
  , (126, '')
  , (127, '')
  , (128, '[email protected]')
  , (128, '[email protected]')
  , (129, '[email protected]');


SELECT DISTINCT C.CUST_ID
  , C.CUST_EMAIL
  , DENSE_RANK() OVER(ORDER BY T.CUST_ID) AS NEW_CUST_KEY
FROM @Customers AS C
INNER JOIN (
  SELECT CUST_ID, CUST_EMAIL
  FROM @Customers
  EXCEPT
  SELECT C2.CUST_ID, C2.CUST_EMAIL
  FROM @Customers AS C1
  INNER JOIN @Customers AS C2
    ON C2.CUST_EMAIL = C1.CUST_EMAIL
    AND C2.CUST_ID > C1.CUST_ID
    AND C1.CUST_EMAIL <> ''
  ) AS T
  ON CASE
    WHEN (T.CUST_ID = C.CUST_ID AND T.CUST_EMAIL = C.CUST_EMAIL AND T.CUST_EMAIL = '') THEN 1
    WHEN (T.CUST_ID = C.CUST_ID OR T.CUST_EMAIL = C.CUST_EMAIL) AND T.CUST_EMAIL <> '' THEN 1
    ELSE 0
  END = 1;

Using your test data it produced exactly this, it does seem to match your expectations:

╔═════════╦═════════════════╦═══════════════╗
║ CUST_ID ║   CUST_EMAIL    ║ NEW_CUST_KEY  ║
╠═════════╬═════════════════╬═══════════════╣
║     123 ║ [email protected] ║             1 ║
║     123 ║ [email protected] ║             1 ║
║     124 ║ [email protected] ║             2 ║
║     125 ║ [email protected] ║             2 ║
║     126 ║                 ║             3 ║
║     127 ║                 ║             4 ║
║     128 ║ [email protected] ║             5 ║
║     128 ║ [email protected] ║             5 ║
║     129 ║ [email protected] ║             5 ║
╚═════════╩═════════════════╩═══════════════╝

You can see this in real life on data.stackexchange.com

Let me know if this works in your actual DB.

Upvotes: 0

Andrey Rybalkin
Andrey Rybalkin

Reputation: 206

OLDTABLE - is your table NEWTABLE - will have result

enter image description here

CREATE TABLE #NEWTABLE
(
    NEW_CUST_KEY int not null ,
    CUST_ID int not null, 
    CUST_EMAIL nvarchar(100) null   
)
------------------------------------
insert into #NEWTABLE (NEW_CUST_KEY,CUST_ID,CUST_EMAIL)
SELECT ROW_NUMBER() OVER(ORDER BY CUST_ID, CUST_EMAIL) AS NEW_CUST_KEY, CUST_ID, CUST_EMAIL 
FROM
(
    SELECT CUST_ID, CUST_EMAIL
    FROM OLDTABLE 
    GROUP BY CUST_ID, CUST_EMAIL
) T


UPDATE Upd SET NEW_CUST_KEY = T.NEW_CUST_KEY
FROM #NEWTABLE Upd
join (
    SELECT CUST_ID, min(NEW_CUST_KEY) AS NEW_CUST_KEY 
    FROM #NEWTABLE 
    GROUP BY CUST_ID) T
on Upd.CUST_ID = T.CUST_ID

UPDATE Upd SET NEW_CUST_KEY = T.NEW_CUST_KEY
FROM #NEWTABLE Upd
join (
    SELECT CUST_EMAIL, min(NEW_CUST_KEY) AS NEW_CUST_KEY 
    FROM #NEWTABLE 
    GROUP BY CUST_EMAIL) T    
on nullif(Upd.CUST_EMAIL,'') = nullif(T.CUST_EMAIL,'')    

UPDATE Upd SET NEW_CUST_KEY = T.CHANGE_CUST_KEY
FROM #NEWTABLE Upd
join (
    SELECT NEW_CUST_KEY, ROW_NUMBER() OVER(ORDER BY NEW_CUST_KEY) AS CHANGE_CUST_KEY
    FROM #NEWTABLE 
    GROUP BY NEW_CUST_KEY) T
on Upd.NEW_CUST_KEY = T.NEW_CUST_KEY

select * from #NEWTABLE

Upvotes: 1

AntDC
AntDC

Reputation: 1917

I think you could use row_number..... Something like this......

SELECT DISTINCT CUST_ID, CUST_EMAIL
ROW_NUMBER() OVER(PARTITION BY CUST_ID, CUST_EMAIL) AS New_Cust_Key
FROM YOUR TABLES

Upvotes: 0

Related Questions