Reputation: 1155
I have lots of rows in my table in Postgres DB.
I make insertions in this table every 20 minutes, clear old entries every day and have only 2 selection options from it.
So I want to optimise time, that I wait for my selections.
First selection is kind of:
Select * from table where item=<item_id>
Second is kind of:
Select distinct(datetime) from table
So, in order to optimize 1 selection, I may make indexies for item
field. As I understood this technique works perfectly for queries where smth equals
smth.
But I don't know how to optimise my 2 selection query. I think smth like partitioning should help me, but there are several types of partitioning and I am a bit confused.
So what is the best option to optimize my queries?
Also, I am using python and Django models. If there is a good library that may do all the dirty job. That would be great. Most suitable now, that I found: http://architect.readthedocs.io/
Edit1 Thanks to Evan Carrol.
Tried to use index for second query. Command:
explain analyze select distinct time_updated from wow_auction_gordunni
Gives:
HashAggregate (cost=335091.65..335092.51 rows=86 width=8) (actual time=4246.582..4246.607 rows=91 loops=1)
Group Key: time_updated
-> Seq Scan on wow_auction_gordunni (cost=0.00..313574.92 rows=8606692 width=8) (actual time=0.047..2257.979 rows=8616562 loops=1)
Planning time: 0.080 ms
Execution time: 4246.675 ms
Then creating index and vacuum:
Create INDEX ON wow_auction_gordunni (time_updated);
VACUUM ANALYZE wow_auction_gordunni;
explain analyze select distinct time_updated from wow_auction_gordunni;
Gives following:
Unique (cost=0.43..249907.42 rows=92 width=8) (actual time=0.057..3537.626 rows=92 loops=1)
-> Index Only Scan using wow_auction_gordunni_time_updated_idx on wow_auction_gordunni (cost=0.43..228163.42 rows=8697599 width=8) (actual time=0.055..2488.408 rows=8696562 loops=1)
Heap Fetches: 85796
Planning time: 0.726 ms
Execution time: 3537.800 ms
So it seems that index helps a bit (postgres started to use index), but not dramatically.
Upvotes: 1
Views: 261
Reputation: 1
It will use an index scan so long as it makes sense. Sample data,
CREATE TABLE foo
AS
SELECT
x%3 AS x,
repeat( md5(x::text)::text, 200 ) AS t1
FROM generate_series(1,1e6) AS t(x);
CREATE INDEX ON foo (x);
VACUUM ANALYZE foo;
Query,
EXPLAIN ANALYZE SELECT DISTINCT x FROM Foo;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.42..28480.42 rows=200 width=32) (actual time=0.034..257.734 rows=3 loops=1)
-> Index Only Scan using foo_x_idx on foo (cost=0.42..25980.42 rows=1000000 width=32) (actual time=0.031..122.668 rows=1000000 loops=1)
Heap Fetches: 0
Planning time: 0.090 ms
Execution time: 257.764 ms
(5 rows)
So to optimize your second selection query, create an index on datetime. Check out the EXPLAIN ANALYZE
. See if it gets use an an index. If that doesn't help or the index isn't being used you can try to set enable_seqscan = off
and then rerunning the query. Now you know what the savings would be, if any. You can paste both plans here and we can take a look at it.
Upvotes: 1
Reputation: 247235
There is no way to optimize the second query, it has to scan the whole table to find all possible values of datetime
.
The best you can do is to see that the table is not bloated by clearing the table with TRUNCATE
rather than DELETE
, and to put enough RAM into the machine so that the whole table is in RAM.
Upvotes: 0