user2790103
user2790103

Reputation: 351

Does Postgres store by primary keys?

I'm developing a note app. You can think it's like Evernote.
For this kind of apps, every user will only query their own data. They won't query others' data.

Let's assume the table definition

CREATE TABLE "notes" (
  id integer,
  user_id integer,
  content text,
  created_at timestamp
);

A user normally wants to query his latest 100 notes.

select * from notes where user_id = $my_id limit 100 order by created_at desc

Even though I can add index(user_id, created_at) or index(user_id, id), if the rows from the same user are stored in separate disks, data acquisition will still be slow.

What should I do to make sure the rows from the same user are stored as close as possible?

I don't consider NoSql.

Upvotes: 0

Views: 93

Answers (1)

Laurenz Albe
Laurenz Albe

Reputation: 246788

I can think of three ways to try:

  1. Create an index

    CREATE INDEX ON notes (user_id, created_at) INCLUDE (id);
    

    and query like this:

    SELECT id, created_at
    FROM notes
    WHERE user_id = $my_id
    ORDER BY created_at DESC
    LIMIT 100;
    

    If the table is autovacuumed often enough (lower autovacuum_vacuum_scale_factor for the table), you will get an index only scan that doesn't have to fetch data from the table, so it can be fast even if the table is not clustered.

    Then you fetch the content for those notes that the user wants to inspect.

    That would be a net loss if the user wants to see the content of all 100 notes, but would win out if that is not the rule.

  2. You can CLUSTER the table USING and index on (user_id, created_at).

    Then PostgreSQL rewrites the table in that order, and your index scan will be fast.

    There are two disadvantages to this approach:

    • While CLUSTER is running, the tables is locked even to readers. So this is effectively down time.

    • The order is not maintained, but will decay as the table is modified, so you will have to repeat the CLUSTER run regularly.

  3. List partition the table by user_id. You could lump several users into one partition to limit the number of partitions.

    Then you can put the individual partitions in different tablespaces on different disks.

    But it is questionable if that will give you a speed gain. Usually it is an advantage rather than a disadvantage to have data spread over several devices (striping).

    The main benefit here would be that dropping a partition is cheap, so if you have one customer per partition, it is easy to get rid of all their data at once.

Upvotes: 1

Related Questions