user391986
user391986

Reputation: 30926

Filter over duplicate records

I'm trying to identify a distinct list of person_id that have the same account_id as another person.

The caveat is the person_id who was last given the overlapping account_id should not be included in that list.

        id          person_id     account_id
   +------------------------------------------+
   |            |             |               |
   |     1      |     1       |      10       |
   +------------------------------------------+
   |            |             |               |
   |     2      |     2       |      10       |
   +------------------------------------------+
   |            |             |               |
   |     3      |     3       |      11       |
   +------------------------------------------+

please note: this is a slightly simplified example not to be taken too literally.


This is the query I currently have

SELECT STRING_AGG(person_id, ',')
FROM accounts_map
WHERE created_at > '2001-01-10' -- ignore records smaller than 2001-01-10
GROUP BY account_id -- group by account id
HAVING count(*) > 1 -- any account that have multiple matches

What I can't figure out is

  1. How for each account_id match overlap, exclude the latest record
  2. How to ignore records that have account_id marked as null, somehow <> NULL is not working
CREATE TABLE accounts_map
    ([id] int, [person_id] int, [account_id] int, [created_at] DATETIME)
;

INSERT INTO accounts_map
    ([id], [person_id], [account_id], [created_at])
VALUES
    (1, 1, 10, '2001-01-10'),
    (2, 1, 11, '2001-01-10'),
    (2, 1, 12, '2001-01-10'),

    (3, 2, 20, '2019-01-11'),
    (4, 2, 21, '2019-01-11'),
    (5, 2, 22, '2019-01-11'),

    (3, 3, 30, '2019-01-12'),
    (4, 3, 31, '2019-01-12'),
    (5, 3, 32, '2019-01-12'),

    -- same account ids as records with id (1,2,3) but inserted in different order
    (6, 4, 12, '2019-01-13'),
    (7, 4, 11, '2019-01-13'),
    (8, 4, 10, '2019-01-13'),

    -- same account ids as records with id (3,4,5)
    (9, 5, 20, '2019-01-14'),
    (10, 5, 21, '2019-01-14'),
    (11, 5, 22, '2019-01-14'),

    -- same account ids as records with id (9,10,11)
    (12, 6, 20, '2019-01-15'),
    (13, 6, 21, '2019-01-15'),
    (14, 6, 22, '2019-01-15')
;

expected output

id
---
1 (not expected if date range cutoff specified) created_at > '2001-01-10'
2
4 (not expected if date range cutoff specified) created_at > '2001-01-10'
5

If it has an account that overlaps, the latest user who was granted the account becomes the valid one corrupting the old ones.

http://sqlfiddle.com/#!18/f53e5/1

Upvotes: 1

Views: 38

Answers (1)

GMB
GMB

Reputation: 222622

In a subqueryn you can do a window count within groups of records sharing the same account_id and use row_number() to rank them by date. Then the outer query filters on records whose window count is greater than 1 and returns all records but the latest:

select person_id, account_id, created_at
from (
    select 
        t.*, 
        row_number() over(partition by account_id order by created_at desc) rn,
        count(*) over(partition by account_id) cnt
    from accounts_map t
) t
where cnt > 1 and rn > 1

Demo on DB Fiddle:

person_id | account_id | created_at         
--------: | ---------: | :------------------
        1 |         10 | 10/01/2001 00:00:00
        1 |         11 | 10/01/2001 00:00:00
        1 |         12 | 10/01/2001 00:00:00
        5 |         20 | 14/01/2019 00:00:00
        2 |         20 | 11/01/2019 00:00:00
        5 |         21 | 14/01/2019 00:00:00
        2 |         21 | 11/01/2019 00:00:00
        5 |         22 | 14/01/2019 00:00:00
        2 |         22 | 11/01/2019 00:00:00

Note: you provided sample data but unfortunately not the associated expected results, for us to validate the output of the query.

Upvotes: 2

Related Questions