Comparing Column Values and returning ERROR or OK

Question

I'm needing to verify a source system with a destination system and ensure the values are matching between them. The problem is the source system is a total mess and is proving hard to validate.

I've got the following sample data where they should all be OK, but they're showing as ERROR. Does anyone know a way of doing a comparison that would result as an OK for all for the below?

CREATE TABLE #testdata (
    ID INT
    ,ValueSource VARCHAR(800)
    ,ValueDestination VARCHAR(800)
    ,Value_Varchar_Check AS (
        CASE 
            WHEN coalesce(ValueSource, '0') = coalesce(ValueDestination, '0')
                THEN 'OK'
            ELSE 'ERROR'
            END
        )
    )

INSERT INTO #testdata (
    ID
    ,ValueSource
    ,ValueDestination
    )
SELECT 1
    ,'hepatitis c,other (specify)' 'hepatitis c, other (specify)'    
UNION ALL    
SELECT 2
    ,'lung problems / asthma,lung problems / asthma'
    ,'lung problems / asthma'    
UNION ALL    
SELECT 3
    ,'lung problems / asthma,diabetes'
    ,'diabetes, lung problems / asthma'    
UNION ALL    
SELECT 4
    ,'seizures/epilepsy,hepatitis c,seizures/epilepsy'
    ,'hepatitis c, seizures/epilepsy'

Nick · Accepted Answer

I don't think you can write this as a generated column as it is quite a tricky thing to compute. If you are using SQL Server 2016 or later, you can use STRING_SPLIT to convert the ValueSource and ValueDestination values into tables and then sort them alphabetically using a query like this:

SELECT DISTINCT ID, TRIM(value) AS value,
       DENSE_RANK() OVER (PARTITION BY ID ORDER BY TRIM(value)) AS rn
FROM testdata
CROSS APPLY STRING_SPLIT(ValueSource, ',')

For ValueSource, this produces:

ID  value                   rn
1   hepatitis c             1
1   other (specify)         2
2   lung problems / asthma  1
3   diabetes                1
3   lung problems / asthma  2
4   hepatitis c             1
4   seizures/epilepsy       2

You can then FULL OUTER JOIN those two tables on ID, value and rn, and detect an error when there are null values from either side (since that implies that the values for a given ID and rn don't match):

WITH t1 AS (
  SELECT DISTINCT ID, TRIM(value) AS value,
         DENSE_RANK() OVER (PARTITION BY ID ORDER BY TRIM(value)) AS rn
  FROM testdata
  CROSS APPLY STRING_SPLIT(ValueSource, ',')
),
t2 AS (
  SELECT DISTINCT ID, TRIM(value) AS value,
         DENSE_RANK() OVER (PARTITION BY ID ORDER BY TRIM(value)) AS rn
  FROM testdata
  CROSS APPLY STRING_SPLIT(ValueDestination, ',')
)
SELECT COALESCE(t1.ID, t2.ID) AS ID,
       CASE WHEN COUNT(CASE WHEN t1.value IS NULL OR t2.value IS NULL THEN 1 END) > 0 THEN 'Error'
            ELSE 'OK'
       END AS Status
FROM t1
FULL OUTER JOIN t2 ON t2.ID = t1.ID AND t2.rn = t1.rn AND t2.value = t1.value
GROUP BY COALESCE(t1.ID, t2.ID)

Output (for your sample data):

ID  Status
1   OK
2   OK
3   OK
4   OK

Demo on SQLFiddle

You can then use the entire query above as a CTE (call it t3) to update your original table:

UPDATE t
SET t.Value_Varchar_Check = t3.Status
FROM testdata t
JOIN t3 ON t.ID = t3.ID

Output:

ID  ValueSource                                         ValueDestination                    Value_Varchar_Check
1   hepatitis c,other (specify)                         hepatitis c, other (specify)        OK
2   lung problems / asthma,lung problems / asthma       lung problems / asthma              OK
3   lung problems / asthma,diabetes                     diabetes, lung problems / asthma    OK
4   seizures/epilepsy,hepatitis c,seizures/epilepsy     hepatitis c, seizures/epilepsy      OK

Demo on SQLFiddle

Comparing Column Values and returning ERROR or OK

Answers (1)

Related Questions