akornev
akornev

Reputation: 37

Сomplex delete duplicates

I have two tables: Players and Games (below). I need to delete the duplicate player names using a query with a Using query row_number() function, but I need to leave the records that are in the Game table. If there are multiple records, then any of them can be left. If there are no records in Games table, then any of them can be left.

For example:

Players

ID Name
1 Oliver --> should be deleted (because ID=2 exists in Games table)
2 Oliver
3 Jack
4 Jack --> should be deleted (there is no Jack records in Games table, may leave any of them)
5 Jack --> should be deleted (there is no Jack records in Games table, may leave any of them)
6 Harry
7 Harry --> should be deleted (both records ID=6,7 are in Games table, may can delete either one)

Games

ID PlayerID
1 2
2 6
3 7

Result should be:

Players

ID Name
2 Oliver
3 Jack
6 Harry

Upvotes: 0

Views: 138

Answers (2)

gzzz
gzzz

Reputation: 368

DELETE FROM Players
WHERE ID IN (
    SELECT ID
    FROM (
        SELECT
            p.ID,
            ROW_NUMBER() OVER (PARTITION BY p.Name ORDER BY g.Amount DESC, p.ID) AS PlayerRank
        FROM Players AS p
        LEFT JOIN (
            SELECT
                PlayerID,
                COUNT(*) AS Amount
            FROM Games
            GROUP BY PlayerID
        )            AS g ON (p.ID = g.PlayerID)
    ) AS d
    WHERE PlayerRank > 1
)

Fiddle.

A little explanation for the guys below (from inner subqueries to outher):

  1. counting games per user;
  2. computing user rank by Name (more games – highter rating for the same name, NULLS is the last (users with no games));
  3. deleting users with duplicate names (has rank > 1).

Upvotes: -3

T N
T N

Reputation: 10205

The usual pattern for deleting duplicates is to number the rows having the same name and delete all rows having row number > 1.

However, you have the additional criteria that you wish to retain all player rows having games. This can be done by:

  1. Identify all Player rows having any associated Games rows.
  2. Number the Player rows having the same name, but place those having games at the top of the number sequence.
  3. Delete all Player rows that both have row number > 1 and do not have an associated Games row.
WITH PlayersHasGames AS (
    SELECT
        *,
        CASE WHEN EXISTS (SELECT * FROM Games G WHERE G.PlayerID = P.ID)
            THEN 1 ELSE 0 END
            AS HasGames
    FROM Players P
),
NumberedPlayers AS (
    SELECT
        *,
        ROW_NUMBER() OVER(
            PARTITION BY Name
            ORDER BY HasGames DESC, ID  -- Players with games first
            ) AS RowNum
    FROM PlayersHasGames
)
DELETE
FROM NumberedPlayers
WHERE RowNum > 1
AND HasGames = 0

Results:

ID Name
2 Oliver
3 Jack
6 Harry
7 Harry

See this db<>fiddle for a demo.

If you wish to delete one of the "Harry" rows, even though both have associated Games rows, you can exclude the AND HasGames = 0 condition. Note that this would violate referential integrity, as one of the Games rows would be an orphan. See this db<>fiddle.

Renumbering the orphan rows would require an extra step to reassign PlayerId values after numbering the Players, but before deleting the duplicates.

DECLARE @PlayerMapping TABLE (
    FromPlayerID INT,
    ToPlayerID INT
)

;WITH PlayersHasGames AS (
    SELECT
        *,
        CASE WHEN EXISTS (SELECT * FROM Games G WHERE G.PlayerID = P.ID)
            THEN 1 ELSE 0 END
            AS HasGames
    FROM Players P
),
NumberedPlayers AS (
    SELECT
        *,
        ROW_NUMBER() OVER(
            PARTITION BY Name
            ORDER BY HasGames DESC, ID  -- Players with games first
            ) AS RowNum
    FROM PlayersHasGames
)
INSERT INTO @PlayerMapping
SELECT FromPlayer.ID AS FromPlayerID, ToPlayer.ID AS ToPlayerID
FROM NumberedPlayers FromPlayer
JOIN NumberedPlayers ToPlayer
    ON ToPlayer.Name = FromPlayer.Name
    AND ToPlayer.RowNum = 1  -- Keeper player
WHERE FromPlayer.RowNum > 1  -- Any player to be deleted (and mapped)

UPDATE G
SET PlayerId = PM.ToPlayerID
FROM Games G
JOIN @PlayerMapping PM ON PM.FromPlayerID = G.PlayerID

DELETE P
FROM Players P
JOIN @PlayerMapping PM ON PM.FromPlayerID = P.ID

Resulting Player Mapping:

FromPlayerID ToPlayerID
1 2
4 3
5 3
7 6

Updated Players:

ID Name
2 Oliver
3 Jack
6 Harry

Updated Games (showing player name lookup):

ID PlayerID PlayerName
1 2 Oliver
2 6 Harry
3 6 Harry

See this db<>fiddle

Upvotes: 4

Related Questions