Reputation: 776
I have clustered data on multiple tables, generally they looks like this:
CREATE TABLE 2012_03_09 (
guid_key integer,
property_key integer,
instance_id_key integer,
time_stamp timestamp without time zone,
"value" double precision
)
with these indexes:
CREATE INDEX 2012_03_09_a
ON 2012_03_09
USING btree
(guid_key, property_key, time_stamp);
CREATE INDEX 2012_03_09_b
ON 2012_03_09
USING btree
(time_stamp, property_key);
When I analyze my queries, total amount of time for Append operations bothers me. Can you explain me, why query is running too long? Is there any way how to optimize such query?
Sort (cost=262.50..262.61 rows=47 width=20) (actual time=1918.237..1918.246 rows=100 loops=1)
Output: 2012_04_26.time_stamp, 2012_04_26.value, 2012_04_26.instance_id_key
Sort Key: 2012_04_26.instance_id_key, 2012_04_26.time_stamp
Sort Method: quicksort Memory: 32kB
-> Append (cost=0.00..261.19 rows=47 width=20) (actual time=69.817..1917.848 rows=100 loops=1)
-> Index Scan using 2012_04_26_a on 2012_04_26 (cost=0.00..8.28 rows=1 width=20) (actual time=14.909..14.909 rows=0 loops=1)
Output: 2012_04_26.time_stamp, 2012_04_26.value, 2012_04_26.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_04_27_a on 2012_04_27 (cost=0.00..8.28 rows=1 width=20) (actual time=1.535..1.535 rows=0 loops=1)
Output: 2012_04_27.time_stamp, 2012_04_27.value, 2012_04_27.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_02_a on 2012_05_02 (cost=0.00..12.50 rows=2 width=20) (actual time=53.370..121.894 rows=6 loops=1)
Output: 2012_05_02.time_stamp, 2012_05_02.value, 2012_05_02.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_03_a on 2012_05_03 (cost=0.00..24.74 rows=5 width=20) (actual time=59.136..170.215 rows=11 loops=1)
Output: 2012_05_03.time_stamp, 2012_05_03.value, 2012_05_03.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_04_a on 2012_05_04 (cost=0.00..12.47 rows=2 width=20) (actual time=67.458..125.172 rows=5 loops=1)
Output: 2012_05_04.time_stamp, 2012_05_04.value, 2012_05_04.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_05_a on 2012_05_05 (cost=0.00..8.28 rows=1 width=20) (actual time=14.112..14.112 rows=0 loops=1)
Output: 2012_05_05.time_stamp, 2012_05_05.value, 2012_05_05.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_07_a on 2012_05_07 (cost=0.00..12.46 rows=2 width=20) (actual time=60.549..99.999 rows=4 loops=1)
Output: 2012_05_07.time_stamp, 2012_05_07.value, 2012_05_07.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_08_a on 2012_05_08 (cost=0.00..24.71 rows=5 width=20) (actual time=63.367..197.296 rows=12 loops=1)
Output: 2012_05_08.time_stamp, 2012_05_08.value, 2012_05_08.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_09_a on 2012_05_09 (cost=0.00..28.87 rows=6 width=20) (actual time=59.596..224.685 rows=15 loops=1)
Output: 2012_05_09.time_stamp, 2012_05_09.value, 2012_05_09.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_10_a on 2012_05_10 (cost=0.00..28.85 rows=6 width=20) (actual time=56.995..196.590 rows=13 loops=1)
Output: 2012_05_10.time_stamp, 2012_05_10.value, 2012_05_10.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_11_a on 2012_05_11 (cost=0.00..20.59 rows=4 width=20) (actual time=62.761..134.313 rows=8 loops=1)
Output: 2012_05_11.time_stamp, 2012_05_11.value, 2012_05_11.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_12_a on 2012_05_12 (cost=0.00..8.28 rows=1 width=20) (actual time=12.018..12.018 rows=0 loops=1)
Output: 2012_05_12.time_stamp, 2012_05_12.value, 2012_05_12.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_13_a on 2012_05_13 (cost=0.00..8.28 rows=1 width=20) (actual time=12.286..12.286 rows=0 loops=1)
Output: 2012_05_13.time_stamp, 2012_05_13.value, 2012_05_13.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_14_a on 2012_05_14 (cost=0.00..16.58 rows=3 width=20) (actual time=92.161..156.802 rows=6 loops=1)
Output: 2012_05_14.time_stamp, 2012_05_14.value, 2012_05_14.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_15_a on 2012_05_15 (cost=0.00..25.03 rows=5 width=20) (actual time=73.636..263.537 rows=14 loops=1)
Output: 2012_05_15.time_stamp, 2012_05_15.value, 2012_05_15.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
-> Index Scan using 2012_05_16_a on 2012_05_16 (cost=0.00..12.56 rows=2 width=20) (actual time=100.893..172.404 rows=6 loops=1)
Output: 2012_05_16.time_stamp, 2012_05_16.value, 2012_05_16.instance_id_key
Index Cond: ((guid_key = 2105) AND (property_key = 67) AND (time_stamp >= '2012-04-16 00:00:00'::timestamp without time zone) AND (time_stamp <= '2012-05-16 06:25:50.172'::timestamp without time zone))
Total runtime: 1918.745 ms
Posting also SQL query:
select time_stamp, value, instance_id_key as segment from perf_hourly_2012_04_26 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_04_27 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_02 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_03 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_04 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_05 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_07 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_08 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_09 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_10 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_11 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_12 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_13 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_14 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_15 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
UNION ALL
select time_stamp, value, instance_id_key as segment from 2012_05_16 where guid_key = 2105 and property_key=67 and time_stamp between '2012-04-16 00:00:00.0'::timestamp without time zone and '2012-05-16 06:25:50.172'::timestamp without time zone
ORDER BY 3 ASC, 1 ASC
Upvotes: 1
Views: 1232
Reputation: 20408
The UNION isn't your timing problem, its reported elapsed time is basically the sum of each partition's index scan time. Your _a indexes look to be appropriately selective for your query predicate. The real time culprit I see in the explain analyze is that it's taking a long time to retrieve just a few rows with an index scan on each partition. eg: 125ms for 5 rows on 2012_05_04. The index scan should invoke maybe 0-5 seeks depending on cache state and table size, and if the data isn't clustered, there would then be one seek per data row. A slow single spindle disk should be able to do a seek and block fetch in ~10ms, so worst case for that scan with a crappy storage system is around 100ms, but with more common 7200 or 10K rpm disks, and multiple spindles, the worst case assuming no cache hits should be under 50ms. With decent cache retention, I would expect no more than a few tens of milliseconds for each partition's index scan.
Does this query run faster on a second try immediately after the first? If so, that points to slow storage with a cold cache as the issue. What sort of storage is the database running on? If we're talking about slow laptop drive, or a high latency network mount, that would explain the poor IO perf. Index scan can also be impacted by extreme index bloat. If you have dozens or hundreds of dead index entries because of update/delete churn on the data with improper vacuum regimen, then that could be the culprit. Are these tables regularly vacummed and analyzed?
As Adrian Serafin suggested, you should look into Pg's table partitioning features.
Upvotes: 1
Reputation: 7141
Besides the append, all the rows appear to be obtained index scans of the first type. I'd have to wonder if that's the best index for this. As you seem to be selecting significant time ranges, the only other choices are guid_key and property_key. Which is more selective? The more selective column should be first (that is, if you aren't worried about sorting, which I don't think you should be for 100 rows) Secondly, did you add those indexes for the purposes of this query or other queries? It may make sense to drop them if they aren't useful anywhere else. Indexes can actually slow performance, especially if the table records are already in memory most of the time, as they might require the database to unload the records from memory to load the index (and then load the table records back up once you're finished with the index scan).
The only real suggestion here I can give is to play with it.
Edit:
(of course there's the other issues of why do these records not have some sort of primary key and there is/isn't clustering on the table itself that I ignored but they also come into play here.)
Upvotes: 2
Reputation: 7715
It looks like you should check Postgresql Partitioning. Your query will be simpler and it may perform better (not 100% sure, but I think it's worth a try)
Upvotes: 2