Response time query comparison in postgresql/postgis

Question

I measure the response time of a spatiotemporal query in two different setups.

a) I have a single instance that I have installed PostgreSQL with postGIS extension

b) I have 5 instances (1 master, 3 slaves, 1 client for pgpool II) - pgpool II replication.

My query is:

startTW = time.time()
fetchinTW = """SELECT col.vessel_hash,ST_X(col.the_geom) AS long, ST_Y(col.the_geom) AS lat
                FROM samplecol AS col
                WHERE col.timestamp >='2016-06-10T00:00:00.000Z' and col.timestamp <= '2016-07-10:00:00.000Z' """

cursor.execute(fetchinTW)
end_query3 = time.time()
print "Time to execute query: ", end_query3 - start_TW

In the results I see that the response time in a) setup is smaller than in b).

a setup -> Response Time: 45,3456 seconds

b setup -> Response Time: 28,4658 seconds

Before I ran the experiments I thought that the response time would be greater in case of pgpool II replication (b setup) than in a and the reason is that the data are replicated between the nodes for availability and fault tolerance and the pgpool II would choose the node to send the query and this have an overhead. For a single node the case is more simple so I thought that the response time would be better (a setup).

Can anyone explain this behavior? Or have an idea why this happens?

Jim Jones · Accepted Answer

I believe it is the expected behavior. According to the documentation, pgpool-II's parallel query feature allows queries to be split in different servers:

Load Balance

If a database is replicated(because running in either replication mode or master/slave mode), performing a SELECT query on any server will return the same result. pgpool-II takes advantage of the replication feature in order to reduce the load on each PostgreSQL server. It does that by distributing SELECT queries among available servers, improving the system's overall throughput. In an ideal scenario, read performance could improve proportionally to the number of PostgreSQL servers. Load balancing works best in a scenario where there are a lot of users executing many read-only queries at the same time.

Parallel Query

Using the parallel query feature, data can be split among multiple servers, so that a query can be executed on all the servers concurrently, reducing the overall execution time. Parallel query works best when searching large-scale data.

Here is a list of the config parameters for the parallel mode feature.

Response time query comparison in postgresql/postgis

Answers (1)

Related Questions