Hoang Hiep
Hoang Hiep

Reputation: 2412

Optimise many-to-many join

I have three tables: groups and people and groups_people which forms a many-to-many relationship between groups and people.

Schema:

CREATE TABLE groups (
    id          SERIAL PRIMARY KEY,
    name        TEXT
);

CREATE TABLE people (
    id          SERIAL PRIMARY KEY,
    name        TEXT,
    join_date   TIMESTAMP
);

CREATE TABLE groups_people (
    group_id    INT REFERENCES groups(id),
    person_id   INT REFERENCES people(id)
);

When I want to query for the latest 10 people who recenlty joined the group which has id = 1:

WITH person_ids AS (SELECT person_id FROM groups_people WHERE group_id = 1)
SELECT * FROM people WHERE id = ANY(SELECT person_id FROM person_ids)
ORDER BY join_date DESC LIMIT 10;

The query needs to scan all of the joined people then ordering them before selecting. That would be slow if the group containing too many people. Is there anyway to work around it?

Upvotes: 1

Views: 82

Answers (3)

Pரதீப்
Pரதீப்

Reputation: 93724

Try rewriting using EXISTS

SELECT *
FROM   people p
WHERE  EXISTS (SELECT 1
               FROM   groups_people ps
               WHERE  p.id = ps.person_id and group_id = 1)
ORDER  BY join_date DESC 
LIMIT 10;

Upvotes: 1

leeyuiwah
leeyuiwah

Reputation: 7152

Schema (re-)design to allow same person joining multiple group

Since you mentioned that the relationship between groups and people is many-to-many, I think you may want to move join_date to groups_people (from people) because the same person can join different groups and each such event has its own join_date

So I would change the schema to

CREATE TABLE people (
    id          SERIAL PRIMARY KEY,
    name        TEXT --,                    -- change
    -- join_date   TIMESTAMP                -- delete
);                                      

CREATE TABLE groups_people (
    group_id    INT REFERENCES groups(id),
    person_id   INT REFERENCES people(id),  -- change
    join_date   TIMESTAMP                   -- add
);

Query

select
    p.id
    , p.name
    , gp.join_date
from
    people          as p
    , groups_people as gp
where
    p.id = gp.person_id 
    and gp.group_id=1 
order by gp.join_date desc
limit 10 

Disclaimer: The above query is in MySQL syntax (the question was originally tagged with MySQL)

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1269873

This seems much easier to write as a simple join with order by and limit:

select p.*
from people p join
     groups_people gp
     on p.id = gp.person_id
where gp.group_id = 1
order by gp.join_date desc
limit 10;   -- or fetch first 10 rows only

Upvotes: 2

Related Questions