Singsiblaw
Singsiblaw

Reputation: 13

MySQL query that would delete duplicates except the first and last rows for each user

I have been taking a dump of telephone number assignment on a weekly basis and then imported the data into a MySQL table. A record would have at least the following relevant information: ph_index, ph_num, userid and last_update. I would like to create a MySQL statement that would delete duplicated records but would leave the earliest and latest records for each user alone.

For example: If a telephone number has been assigned only to user1, I only want the earliest assigned and the most recent records.

Initial data:

ph_index, ph_num, userid and last_update
1, 999-999-9999, user1, 2000-01-01
2, 999-999-9999, user1, 2000-01-08
3, 999-999-9999, user1, 2000-01-15
4, 999-999-9999, user1, 2000-01-21

After deleting the duplicates

ph_index, ph_num, userid and last_update
1, 999-999-9999, user1, 2000-01-01
4, 999-999-9999, user1, 2000-01-21

If a telephone number has been assigned to user1 for a period, then re-assigned to user2 for another period, and then back to user1; I want the first and last dates each period.

Initial data:

ph_index, ph_num, userid and last_update
1, 999-999-9999, user1, 2000-01-01
2, 999-999-9999, user1, 2000-01-08
3, 999-999-9999, user1, 2000-01-15
4, 999-999-9999, user2, 2000-01-21
5, 999-999-9999, user2, 2000-01-28
6, 999-999-9999, user2, 2000-02-04
7, 999-999-9999, user1, 2000-02-11
8, 999-999-9999, user1, 2000-02-18
9, 999-999-9999, user1, 2000-02-25

After deleting the duplicates

ph_index, ph_num, userid and last_update
1, 999-999-9999, user1, 2000-01-01
3, 999-999-9999, user1, 2000-01-15
4, 999-999-9999, user2, 2000-01-21
6, 999-999-9999, user2, 2000-02-04
7, 999-999-9999, user1, 2000-02-11
9, 999-999-9999, user1, 2000-02-25

I would like to be able to do this for all 2000+ telephone numbers in the table.

TIA

Upvotes: 0

Views: 29

Answers (1)

id'7238
id'7238

Reputation: 2631

WITH cte AS (
    SELECT 
        *,
        (userid = LAG(userid, 1, '') OVER w_ph_num) 
        AND 
        (userid = LEAD(userid, 1, '') OVER w_ph_num) 
        AS deletion_flag
    FROM dump
    WINDOW w_ph_num AS (PARTITION BY ph_num ORDER BY last_update)
)
DELETE dump
FROM dump
JOIN cte ON cte.ph_index = dump.ph_index
WHERE cte.deletion_flag = 1

In MySQL since version 8.0.2, the LAG and LEAD window functions can be used to find the previous and next user for each phone number. If the user matches both, the record can be deleted.

ph_index ph_num userid last_update prev_user (LAG) next_user (LEAD) deletion_flag
1 999-999-9999 user1 2000-01-01 user1 0
2 999-999-9999 user1 2000-01-08 user1 user1 1
3 999-999-9999 user1 2000-01-15 user1 user2 0
4 999-999-9999 user2 2000-01-21 user1 user2 0
5 999-999-9999 user2 2000-01-28 user2 user2 1
6 999-999-9999 user2 2000-02-04 user2 user1 0
7 999-999-9999 user1 2000-02-11 user2 user1 0
8 999-999-9999 user1 2000-02-18 user1 user1 1
9 999-999-9999 user1 2000-02-25 user1 0

db<>fiddle

Upvotes: 2

Related Questions