Reputation: 21
Is there a reason why the same query executed a number of times have huge variance in response times? from 50% - 200% what the projected response time is? They range from 6 seconds to 20 seconds even though it is the only active query in the database.
Context:
Database on Postgres 9.6 on AWS RDS (with Provisioned IOPS)
Contains one table comprising five numeric columns, indexed on id, holding 200 million rows
The query:
SELECT col1, col2
FROM calculations
WHERE id > 0
AND id < 100000;
The query's explain plan:
Bitmap Heap Scan on calculation (cost=2419.37..310549.65 rows=99005 width=43)
Recheck Cond: ((id > 0) AND (id <= 100000))
-> Bitmap Index Scan on calculation_pkey (cost=0.00..2394.62 rows=99005 width=0)
Index Cond: ((id > 0) AND (id <= 100000))
Is there any reasons why a simple query like this isn't more predictable in response time?
Thanks.
Upvotes: 2
Views: 867
Reputation: 21
After investigation of the historical load, we have found out that the provisioned IOPS we originally configured had been exhausted during the last set of load tests performed on the environment.
According to Amazon's documentation @http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html, after this point, Amazon does not guarantee consistency in execution times and the SLAs are no longer applicable.
We have confirmed that replicating the database onto a new instance of AWS RDS with same configuration yields consistent response times when executing the query multiple times.
Upvotes: 0
Reputation: 121834
A query may be (and should be, excluding special cases) more predictable in response time when you are a sole user of the server. In the case of a cloud server, you do not know anything about the actual server load, even if your query is the only one performed on your database, because the server most likely supports multiple databases at the same time. As you asked about response time, there may be also various circumstances involved in accessing a remote server over the network.
Upvotes: 2
Reputation: 4820
When you see something like this in PostgreSQL EXPLAIN ANALYZE
:
(cost=2419.37..310549.65)
...it doesn't mean the cost is between 2419.37 and 310549.65. These are in fact two different measures. The first value is the startup cost, and the second value is the total cost. Most of the time you'll care only about the total cost. The times that you should be concerned with startup cost is when that component of the execution plan is in relation to (for example) an EXISTS
clause, where only the first row needs to be returned (so you only care about startup cost, not total, as it exits almost immediately after startup).
The PostgreSQL documentation on EXPLAIN
goes into greater detail about this.
Upvotes: 2