GTF
GTF

Reputation: 8395

postgresql: selecting on the basis of the latest pivot row

Given a simple schema:

users
-----
- id<int>
- name<string>

organisations
-------------
- id<int>
- name<string>

organisations_users
-------------------
- id<int>
- userId<int>
- organisationId<int>
- joinedAt<date>

The N:N relation between users and organisations is tracking the organisational history of an individual user.

How do I get all the users who are currently at organisation X? This is basically the same as sorting the pivot table by joinedAt, then only getting the latest organisationId for each user, and then joining with the users who match. I've thought of various ways of collecting all the information I need but the only way I can think of doing this is filtering in the application code rather than in the SQL query itself (but I'm sure there's a way of doing it).

sample data

insert into users (id, name) values
    (1, 'Test User One'),
    (2, 'Test User Two'),
    (3, 'Test User Three');

insert into organisations (id, name) values
    (1, 'Test Org One'),
    (2, 'Test Org Two'),
    (3, 'Test Org Three');

insert into organisations_users ("userId", "organisationId", "joinedAt") values
    (1, 1, '2015-01-01'),
    (1, 2, '2016-01-01'),
    (2, 2, '2015-01-01'),
    (2, 3, '2016-01-01'),
    (3, 3, '2015-01-01'),
    (3, 2, '2016-01-01');

At this point, users [1, 3] are in organisation 2 and user 2 is in organisation 3. No-one is in organisation 1.

expected result

if I were to try to get everyone in organisation 2 I would obtain users 1 and 3 (with their joinedAt dates):

id | name            | organisationId | joinedAt
--------------------------------------------------
1  | Test User One   | 2              | 2016-01-01
3  | Test User Three | 2              | 2016-01-01

I would not get user 2, because although this user was in organisation 2 initially, the user joined organisation 3 subsequently.

Upvotes: 0

Views: 23

Answers (2)

wildplasser
wildplasser

Reputation: 44250

Alternative, avoiding window functions and max()

SELECT u.id, u.name
        , ou.organisationId
        , ou.joinedAt
FROM users u
JOIN organisations_users ou
        ON u.id = ou.userId
        AND ou.joinedAt <= now()
        AND NOT EXISTS ( -- suppress all but the latest
                SELECT * FROM organisations_users nx
                WHERE nx.userId = ou.userId
                AND nx.organisationId <> ou.organisationId
                AND nx.joinedAt <= now()
                AND nx.joinedAt > ou.joinedAt
                )
WHERE ou.organisationId = 2
        ;

Upvotes: 1

user330315
user330315

Reputation:

This should do it:

select u.id, u.name, ou.organisationId, ou.joinedat
from users u
  join (
     select userid, organisationid, joinedat, 
            max(joinedat) over (partition by userid) as last_join 
     from organisations_users
  ) ou on u.id = ou.userid and ou.joinedat = ou.last_join
where ou.organisationid = 2;

This assumes that a single user can't be in two organisations at the same time. So the most recent joinedat value identifies the user's current organisation.

If a user can join multiple organisations at the same time, you would need to change max(joinedat) over (partition by userid) to max(joinedat) over (partition by userid, organisationid)


Another option is to use distinct on () instead of a window function. Usually the distinct on () solution performs better:

select u.id, u.name, ou.organisationId, ou.joinedat
from users u
  join (
     select distinct on (userid) userid, organisationid, joinedat 
     from organisations_users
     order by userid, joinedat desc
  ) ou on u.id = ou.userid 
where ou.organisationid = 2

Upvotes: 1

Related Questions