Pisoi Flo
Pisoi Flo

Reputation: 75

SQL - auto populate column with correct values

Below is the SQL table where CODE1 and CODE2 are populated and for CODE3 I have input the values that I would like to auto populate. There are over 500.000 rows in total.

CODE1   CODE2   CODE3  
 1      100       A  
 2      100       A  
 3      200       B  
 4      300       A  
 5      300       A  
 1      300       A  
 5      400       A  
 6      400       A  
 2      400       A  
 7      500       C
 8      300       A
 7      600       C
 2      800       A
 3      900       B

I have to auto populate CODE3 in order to group all connections defined by CODE1 & CODE2. If CODE 2 has the same value (100 for rows:1,2) CODE 3 will have same value (A) and if CODE 1 has the same value (1 for rows:1,6) CODE 3 will have same value (A).

Any idea how to achieve this?

Much appreciated!

Upvotes: 0

Views: 2398

Answers (1)

Bulat
Bulat

Reputation: 6969

This is how I understood it:

WITH CODE_COUNTS AS (
SELECT *, COUNT(*) OVER (PARTITION BY CODE2) as CODE2_DUPES
FROM CODES
) 
SELECT 
  CODE1, 
  CODE2,
  CASE WHEN CD.CODE2_DUPES > 1 THEN 'A'
      WHEN EXISTS (SELECT * FROM CODE_COUNTS
                   WHERE CD.CODE1 = CODE1 AND CODE2_DUPES > 1) THEN 'A'
      ELSE 'B' 
  END CODE3
FROM CODE_COUNTS CD;

This code first counts duplicates within each CODE2 in CTE. After that for each CODE1 we checks whether any other row with same CODE1 has a CODE2 with number of duplicates greater than one.

SqlFiddle that provides requested results

Update

WITH CODE_COUNTS AS (
SELECT code1, code2, 
   COUNT(*) OVER (PARTITION BY CODE2) as CODE2_DUPES,
   CASE -- Assign row number within CODE1, where CODE2 is unique
     WHEN COUNT(*) OVER (PARTITION BY CODE2) > 1 THEN 0
     ELSE  ROW_NUMBER() OVER (PARTITION BY CODE1 ORDER BY CODE2) 
   END rn
FROM CODES
) 
SELECT CD.CODE1, CD.CODE2,
 CASE WHEN CD.CODE2_DUPES > 1 THEN 'A' -- CODE is not unique
 ELSE CHAR(ASCII('A') + SUM(CASE WHEN rn = 1 THEN 1 ELSE 0 END) OVER (
           ORDER BY CODE1) ) END CODE3
FROM CODE_COUNTS CD
ORDER BY CD.CODE1, CD.CODE2

There is a bit of magic involved in the code above, I will try to explain:

  1. I assumed that where CODE2 group has several records, we assign A.
  2. Where CODE2 has several records we assign Zero to rn field, which means that A will be incremented by Zero for all associated records.
  3. Where CODE2 is unique, we do running count of CODE1 and increment CODE3 accordingly.

Sql Fiddle for the updated example

I have kept the original version of SQL as it a simpler version of the same algorithm essentially.

Upvotes: 1

Related Questions