Reputation: 189
I have some general queries on executing Postgres function.I recently noticed that if I store the output of any arithmetic or business operation in a variable and then call it at the time of execution in the query instead of doing the operation at time of execution it saved lot of time.
But I am not aware of any practices to be followed in general to reduce the time taken and also improve performance as I am new to Postgres.
Upvotes: 0
Views: 254
Reputation: 325061
Beware of read-modify-write cycles and transaction anomalies.
It's fine to cache values locally so long as you're careful about the scope with which you cache it and with cache invalidation. Be very careful about storing values past the lifetime of the transaction you read it in.
You must also be careful not to use that cached value as an input into a calculation that you write back to the database unless you SELECT ... FOR UPDATE
the value in a transaction that stays open during the write, you use a SERIALIZABLE
transaction, or you use some form of optimistic concurrency control.
If you aren't careful you can get yourself in real trouble, with classics like the banking concurrency example where account id=1 transfers $100 to accounts id=2 and id=3:
session1 session2
begin; begin;
select balance
from account
where id=1; => 100
select balance
from account
where id = 1; => 100
update account
set balance = balance + 100
where id = 2; -- this is safe
update account
set balance = balance + 100
where id = 3; -- this is safe
update account
set balance = 0 -- 100 - 100 = 0
where id = 1;
update account
set balance = 0 -- 100 - 100 = 0
where id = 1;
commit;
commit;
Whoops! You just added $100 to two people's accounts but only took $100 out of id=1's. The updates against id=2 and id=3 were OK because they did an in-place modification of the balance (balance = balance + 100
). The updates to id=1 were not, because they read the value, modified it client side, and wrote a new value.
This is what I mean by a read-modify-write cycle.
It would've been safe if we'd used SELECT ... FOR UPDATE
when reading the balance, because the second transaction would've got stuck until the 1st committed. But it would've been better still if we'd avoided the read-copy-write cycle and just done the updates in-place.
Caching is fine - but can introduce anomalies when the underlying data is updated but your cache doesn't get flushed and refreshed.
Cache invalidation is, in general, a hard problem, but Pg has some tools that help.
In particular, listen
and notify
, invoked from triggers, can be used to eagerly flush data from a cache stored in memcached/redis/whatever via a helper daemon. That means you're much less likely to have to flush large chunks of cache or drop the whole cache whenever something changes.
You also need to make decisions about how out of date it's acceptable for something to be. Sometimes you just don't care if a value is 5 seconds out of date. Or half an hour. Or a week. It depends on the application, the datum in question, etc.
Upvotes: 2
Reputation: 95761
There's nothing particularly wrong with storing values in variables.
If you're storing values so you can write SQL in a procedural, step-by-step way instead of a set-oriented way, then you'd probably be better off not doing that. SQL is a set-oriented language; it usually performs better if you write set-oriented SQL.
The risk of storing values and using them later is that the data underlying those values might have changed since you stored them. Whether that's a real problem is application-specific, but it's usually a problem best avoided.
Upvotes: 0