Reputation: 139
Table1:
id name address1 address2 address3
------------------------------------------
1 Jenny A B NULL
2 John C NULL NULL
3 Jenny B A NULL
4 John NULL NULL C
.....
id1
and id3
are the same in this condition, id2
and id4
are the same too.
Can I delete id3
and id4
? I'm using SQL Server 2019.
Upvotes: 0
Views: 73
Reputation: 1271151
I don't think fancy manipulation of the three columns is needed. Assuming you do not have duplicate values in the columns:
delete t1 from table1 t1
where exists (select 1
from table1 tt1
where tt1.id < t1.id and
(t1.address1 in (tt1.address1, tt1.address2, tt1.address3) or t1.address1 is null
) and
(t1.address2 in (tt1.address1, tt1.address2, tt1.address3) or t1.address2 is null
) and
(t1.address3 in (tt1.address1, tt1.address2, tt1.address3) or t1.address3 is null
) and
( (case when t1.address1 is null then 1 else 0 end +
case when t1.address2 is null then 1 else 0 end +
case when t1.address3 is null then 1 else 0 end
) =
(case when tt1.address1 is null then 1 else 0 end +
case when tt1.address2 is null then 1 else 0 end +
case when tt1.address3 is null then 1 else 0 end
)
)
);
This checks that each non-NULL
value matches and the number of NULL
values is the same. This implies that the two sets of values are equivalent.
The advantage is that you don't have to worry about separators -- which is a concern for either string_agg()
or XML.
Upvotes: 0
Reputation: 30023
You need to define a condition for duplicate rows. One possible approach to define such a condition is to aggregate ordered addresses. The following statement demonstrates this approach:
Table:
CREATE TABLE Data (
id int,
name varchar(10),
address1 varchar(100),
address2 varchar(100),
address3 varchar(100)
)
INSERT INTO Data
(id, name, address1, address2, address3)
VALUES
(1, 'Jenny', 'A', 'B', NULL),
(2, 'John', 'C', NULL, NULL),
(3, 'Jenny', 'B', 'A', NULL),
(4, 'John ', NULL, NULL, 'C')
Statement:
DELETE x
FROM (
SELECT
d.*,
-- Condition for equal addresses
ROW_NUMBER() OVER (PARTITION BY c.CheckCondtition ORDER BY d.id) AS rn
-- Condition for equal name and addresses
-- ROW_NUMBER() OVER (PARTITION BY d.name, c.CheckCondtition ORDER BY d.id) AS rn
FROM Data d
CROSS APPLY (
SELECT CONCAT(',', [address])
FROM (VALUES (d.address1), (d.address2), (d.address3)) v([address])
ORDER BY [address]
FOR XML PATH('')
) c(CheckCondtition)
) x
WHERE x.rn > 1
Upvotes: 1
Reputation: 147266
If you're using SQL Server 2017 or later, you can build up a unique identifier for each row based on the name and each of the address fields using STRING_AGG
; then find row numbers for each occurrence of that value (ordering by id
), and then you can delete the rows with row number > 1:
WITH CTE AS (
SELECT id, name AS value
FROM data
UNION ALL
SELECT id, address1
FROM data
UNION ALL
SELECT id, address2
FROM data
UNION ALL
SELECT id, address3
FROM data
),
CTE2 AS (
SELECT id, STRING_AGG(value, '%') WITHIN GROUP (ORDER BY value) AS v
FROM CTE
GROUP BY id
),
CTE3 AS (
SELECT id, v,
ROW_NUMBER() OVER (PARTITION BY v ORDER BY id) AS rn
FROM CTE2
)
DELETE d
FROM data d
JOIN CTE3 ON CTE3.id = d.id
WHERE CTE3.rn > 1
Upvotes: 1