Thomas
Thomas

Reputation: 12107

row caching with Postgres

Being new to Postgres, I have a question regarding caching. I do not know how much is handled by the database, vs. how much I need to handle myself.

I have rows which are part of a time series. For the sake of example, let's say I have 1 row every second.

My query is to get the last minute (60 rows), on a rolling basis (from now - 1min to now).

Ideally, I could cache the last result in the client and request the last second. Unfortunately, the data has holes in it: some rows are missing and me be added a few seconds later. So I need to query the whole thing anyways.

I could find where the holes are and make a query just for these, this is also an option.

What I was wondering is: how much caching does Postgres perform on its own?

If I have a query returning rows 3, 4, 5, 6 and my next query returns 4, 5, 6, 7. Would rows 4, 5, 6 have to be fetched again, or are they cached by the database?

Upvotes: 1

Views: 1759

Answers (2)

klin
klin

Reputation: 121594

Postgres includes an extensive caching system. It is very likely that subsequent executions of the same query will use the cached data, but we have virtually no influence on it because the cache works autonomously. What can and should be done is to use prepared statements. Per the documentation:

A prepared statement is a server-side object that can be used to optimize performance. When the PREPARE statement is executed, the specified statement is parsed, analyzed, and rewritten. When an EXECUTE command is subsequently issued, the prepared statement is planned and executed. This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied.

However, the scenario described in the question suggests a completely different approach to the issue, namely the use of the NOTIFY / LISTEN feature. In brief:

  • the server sends a notification every time a row is inserted into the table,
  • the application listens on the agreed channel and receives newly entered rows.

In this solution, the application performs the query only once and completes the data from notifications.

Sample trigger function:

create or replace function before_insert_on_my_table()
returns trigger language plpgsql as $$
begin
    perform pg_notify('my_data', new::text);
    return new;
end $$;

create trigger before_insert_on_my_table
before insert on my_table
for each row execute procedure before_insert_on_my_table();

The implementation of the LISTEN statement in an application depends on the language used. For example, the python psycopg2 has convenient tools for it.

Read more about NOTIFY in the docs.

Upvotes: 1

Laurenz Albe
Laurenz Albe

Reputation: 246383

PostgreSQL caches data automatically. Data are read, written and cached in units of 8kB, and whenever you access a row, the block containing the row will be cached.

All you have to make sure is that you have an index on the timestamp of your table, otherwise PostgreSQL has to read the whole table to compute the result. With an index it will read (and cache) only the blocks that contain the data you need, and that will speed up your next query for these data.

Upvotes: 2

Related Questions