Reputation: 30926
I'm trying to identify a distinct list of person_id
that have the same account_id
as another person.
The caveat is the person_id
who was last given the overlapping account_id
should not be included in that list.
id person_id account_id
+------------------------------------------+
| | | |
| 1 | 1 | 10 |
+------------------------------------------+
| | | |
| 2 | 2 | 10 |
+------------------------------------------+
| | | |
| 3 | 3 | 11 |
+------------------------------------------+
please note: this is a slightly simplified example not to be taken too literally.
This is the query I currently have
SELECT STRING_AGG(person_id, ',')
FROM accounts_map
WHERE created_at > '2001-01-10' -- ignore records smaller than 2001-01-10
GROUP BY account_id -- group by account id
HAVING count(*) > 1 -- any account that have multiple matches
What I can't figure out is
CREATE TABLE accounts_map
([id] int, [person_id] int, [account_id] int, [created_at] DATETIME)
;
INSERT INTO accounts_map
([id], [person_id], [account_id], [created_at])
VALUES
(1, 1, 10, '2001-01-10'),
(2, 1, 11, '2001-01-10'),
(2, 1, 12, '2001-01-10'),
(3, 2, 20, '2019-01-11'),
(4, 2, 21, '2019-01-11'),
(5, 2, 22, '2019-01-11'),
(3, 3, 30, '2019-01-12'),
(4, 3, 31, '2019-01-12'),
(5, 3, 32, '2019-01-12'),
-- same account ids as records with id (1,2,3) but inserted in different order
(6, 4, 12, '2019-01-13'),
(7, 4, 11, '2019-01-13'),
(8, 4, 10, '2019-01-13'),
-- same account ids as records with id (3,4,5)
(9, 5, 20, '2019-01-14'),
(10, 5, 21, '2019-01-14'),
(11, 5, 22, '2019-01-14'),
-- same account ids as records with id (9,10,11)
(12, 6, 20, '2019-01-15'),
(13, 6, 21, '2019-01-15'),
(14, 6, 22, '2019-01-15')
;
expected output
id
---
1 (not expected if date range cutoff specified) created_at > '2001-01-10'
2
4 (not expected if date range cutoff specified) created_at > '2001-01-10'
5
If it has an account that overlaps, the latest user who was granted the account becomes the valid one corrupting the old ones.
http://sqlfiddle.com/#!18/f53e5/1
Upvotes: 1
Views: 38
Reputation: 222622
In a subqueryn you can do a window count within groups of records sharing the same account_id
and use row_number()
to rank them by date. Then the outer query filters on records whose window count is greater than 1 and returns all records but the latest:
select person_id, account_id, created_at
from (
select
t.*,
row_number() over(partition by account_id order by created_at desc) rn,
count(*) over(partition by account_id) cnt
from accounts_map t
) t
where cnt > 1 and rn > 1
person_id | account_id | created_at --------: | ---------: | :------------------ 1 | 10 | 10/01/2001 00:00:00 1 | 11 | 10/01/2001 00:00:00 1 | 12 | 10/01/2001 00:00:00 5 | 20 | 14/01/2019 00:00:00 2 | 20 | 11/01/2019 00:00:00 5 | 21 | 14/01/2019 00:00:00 2 | 21 | 11/01/2019 00:00:00 5 | 22 | 14/01/2019 00:00:00 2 | 22 | 11/01/2019 00:00:00
Note: you provided sample data but unfortunately not the associated expected results, for us to validate the output of the query.
Upvotes: 2