Reputation: 341
I am working on cleaning up a customer list from an ecommerce site. The customer list has a many to many relationship between customer ID and customer email. For example, a customer could place an order with the same email while logged in or anonymous, and the result would be two customer records with the same email but different customer IDs. Similarly, a customer could create orders with two different emails while logged in which would result in customer records with the same ID but different emails. Given this, I want to create a list of customers with truly unique IDs based on either email or customer number. In addition, there are situations where the email is blank, so customers records that both have blank emails but different IDs would need to be considered two different customers.
So given something like this:
CUST_ID CUST_EMAIL
------------------------
123 [email protected]
123 [email protected]
124 [email protected]
125 [email protected]
126
127
128 [email protected]
128 [email protected]
129 [email protected]
I would want to generate a key like this:
CUST_ID CUST_EMAIL NEW_CUST_KEY
------------------------------------
123 [email protected] 1
123 [email protected] 1
124 [email protected] 2
125 [email protected] 2
126 3
127 4
128 [email protected] 5
128 [email protected] 5
129 [email protected] 5
Upvotes: 1
Views: 706
Reputation: 14077
I was trying to map your users with their IDs to their emails and vice-versa and as a result I created this Frankenstein monster query:
DECLARE @Customers TABLE
(
CUST_ID INT
, CUST_EMAIL VARCHAR(20)
);
INSERT INTO @Customers (CUST_ID, CUST_EMAIL)
VALUES (123, '[email protected]')
, (123, '[email protected]')
, (124, '[email protected]')
, (125, '[email protected]')
, (126, '')
, (127, '')
, (128, '[email protected]')
, (128, '[email protected]')
, (129, '[email protected]');
SELECT DISTINCT C.CUST_ID
, C.CUST_EMAIL
, DENSE_RANK() OVER(ORDER BY T.CUST_ID) AS NEW_CUST_KEY
FROM @Customers AS C
INNER JOIN (
SELECT CUST_ID, CUST_EMAIL
FROM @Customers
EXCEPT
SELECT C2.CUST_ID, C2.CUST_EMAIL
FROM @Customers AS C1
INNER JOIN @Customers AS C2
ON C2.CUST_EMAIL = C1.CUST_EMAIL
AND C2.CUST_ID > C1.CUST_ID
AND C1.CUST_EMAIL <> ''
) AS T
ON CASE
WHEN (T.CUST_ID = C.CUST_ID AND T.CUST_EMAIL = C.CUST_EMAIL AND T.CUST_EMAIL = '') THEN 1
WHEN (T.CUST_ID = C.CUST_ID OR T.CUST_EMAIL = C.CUST_EMAIL) AND T.CUST_EMAIL <> '' THEN 1
ELSE 0
END = 1;
Using your test data it produced exactly this, it does seem to match your expectations:
╔═════════╦═════════════════╦═══════════════╗
║ CUST_ID ║ CUST_EMAIL ║ NEW_CUST_KEY ║
╠═════════╬═════════════════╬═══════════════╣
║ 123 ║ [email protected] ║ 1 ║
║ 123 ║ [email protected] ║ 1 ║
║ 124 ║ [email protected] ║ 2 ║
║ 125 ║ [email protected] ║ 2 ║
║ 126 ║ ║ 3 ║
║ 127 ║ ║ 4 ║
║ 128 ║ [email protected] ║ 5 ║
║ 128 ║ [email protected] ║ 5 ║
║ 129 ║ [email protected] ║ 5 ║
╚═════════╩═════════════════╩═══════════════╝
You can see this in real life on data.stackexchange.com
Let me know if this works in your actual DB.
Upvotes: 0
Reputation: 206
OLDTABLE - is your table NEWTABLE - will have result
CREATE TABLE #NEWTABLE
(
NEW_CUST_KEY int not null ,
CUST_ID int not null,
CUST_EMAIL nvarchar(100) null
)
------------------------------------
insert into #NEWTABLE (NEW_CUST_KEY,CUST_ID,CUST_EMAIL)
SELECT ROW_NUMBER() OVER(ORDER BY CUST_ID, CUST_EMAIL) AS NEW_CUST_KEY, CUST_ID, CUST_EMAIL
FROM
(
SELECT CUST_ID, CUST_EMAIL
FROM OLDTABLE
GROUP BY CUST_ID, CUST_EMAIL
) T
UPDATE Upd SET NEW_CUST_KEY = T.NEW_CUST_KEY
FROM #NEWTABLE Upd
join (
SELECT CUST_ID, min(NEW_CUST_KEY) AS NEW_CUST_KEY
FROM #NEWTABLE
GROUP BY CUST_ID) T
on Upd.CUST_ID = T.CUST_ID
UPDATE Upd SET NEW_CUST_KEY = T.NEW_CUST_KEY
FROM #NEWTABLE Upd
join (
SELECT CUST_EMAIL, min(NEW_CUST_KEY) AS NEW_CUST_KEY
FROM #NEWTABLE
GROUP BY CUST_EMAIL) T
on nullif(Upd.CUST_EMAIL,'') = nullif(T.CUST_EMAIL,'')
UPDATE Upd SET NEW_CUST_KEY = T.CHANGE_CUST_KEY
FROM #NEWTABLE Upd
join (
SELECT NEW_CUST_KEY, ROW_NUMBER() OVER(ORDER BY NEW_CUST_KEY) AS CHANGE_CUST_KEY
FROM #NEWTABLE
GROUP BY NEW_CUST_KEY) T
on Upd.NEW_CUST_KEY = T.NEW_CUST_KEY
select * from #NEWTABLE
Upvotes: 1
Reputation: 1917
I think you could use row_number..... Something like this......
SELECT DISTINCT CUST_ID, CUST_EMAIL
ROW_NUMBER() OVER(PARTITION BY CUST_ID, CUST_EMAIL) AS New_Cust_Key
FROM YOUR TABLES
Upvotes: 0