Reputation: 2301
I'm trying to optimize some of my selects using the explain analyze, and I can't understand why postgresql uses a sequentials scan instead of index scan:
explain analyze SELECT SUM(a.deure)-SUM(a.haver) as Value FROM assentaments a
LEFT JOIN comptes c ON a.compte_id = c.id WHERE c.empresa_id=2 AND c.nivell=11 AND
(a.data >='2007-01-01' AND a.data <='2007-01-31') AND c.codi_compte LIKE '6%';
------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=44250.26..44250.27 rows=1 width=12)
(actual time=334.054..334.054 rows=1 loops=1)
-> Nested Loop (cost=0.00..44249.20 rows=211 width=12)
(actual time=65.277..333.179 rows=713 loops=1)
-> Seq Scan on comptes c (cost=0.00..8001.72 rows=118 width=4)
(actual time=0.053..64.287 rows=236 loops=1)
Filter: (((codi_compte)::text ~~ '6%'::text) AND
(empresa_id = 2) AND (nivell = 11))
-> Index Scan using index_compte_id on assentaments a
(cost=0.00..307.16 rows=2 width=16) (actual time=0.457..1.138 rows=3 loops=236)
Index Cond: (a.compte_id = c.id)
Filter: ((a.data >= '2007-01-01'::date) AND (a.data <= '2007-01-31'::date))
Total runtime: 334.104 ms
(8 rows)
I've created a custom index:
CREATE INDEX "index_multiple" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST,
empresa_id ASC NULLS LAST, nivell ASC NULLS LAST);
And also I've created three new index for this three fields on comptes table just to check If it takes an index scan, but not, the result is the same:
CREATE INDEX "index_codi_compte" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST);
CREATE INDEX "index_comptes" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST);
CREATE INDEX "index_multiple" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST, empresa_id ASC NULLS LAST, nivell ASC NULLS LAST);
CREATE INDEX "index_nivell" ON "public"."comptes" USING btree(nivell ASC NULLS LAST);
thanks!
m.
assentaments.id and assentaments.data have their index also
select count(*) FROM comptes => 148498
select count(*) from assentaments => 2128771
select count(distinct(codi_compte)) FROM comptes => 137008
select count(distinct(codi_compte)) FROM comptes WHERE codi_compte LIKE '6%' => 368
select count(distinct(codi_compte)) FROM comptes WHERE codi_compte LIKE '6%' AND empresa_id=2; => 303
Upvotes: 13
Views: 4342
Reputation: 11581
If you want an index on TEXT to index LIKE queries, you need to create it with text_pattern_ops, like this :
test=> CREATE TABLE t AS SELECT n::TEXT FROM generate_series( 1,100000 ) n;
test=> CREATE INDEX tn ON t(n);
test=> VACUUM ANALYZE t;
test=> EXPLAIN ANALYZE SELECT * FROM t WHERE n LIKE '123%';
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..1693.00 rows=10 width=5) (actual time=0.027..14.631 rows=111 loops=1)
Filter: (n ~~ '123%'::text)
Total runtime: 14.664 ms
test=> CREATE INDEX tn2 ON t(n text_pattern_ops);
CREATE INDEX
Temps : 267,589 ms
test=> EXPLAIN ANALYZE SELECT * FROM t WHERE n LIKE '123%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=5.25..244.79 rows=10 width=5) (actual time=0.089..0.121 rows=111 loops=1)
Filter: (n ~~ '123%'::text)
-> Bitmap Index Scan on tn2 (cost=0.00..5.25 rows=99 width=0) (actual time=0.077..0.077 rows=111 loops=1)
Index Cond: ((n ~>=~ '123'::text) AND (n ~<~ '124'::text))
Total runtime: 0.158 ms
see details here :
http://www.postgresql.org/docs/9.1/static/indexes-opclass.html
If you do not want to create an additional index, and column is a TEXT, you can replace "compte LIKE '6%'" by "compte >= '6' AND compte < '7'" which is a simple index range condition.
test=> EXPLAIN ANALYZE SELECT * FROM t WHERE n >= '123' AND n < '124';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Index Scan using tn on t (cost=0.00..126.74 rows=99 width=5) (actual time=0.030..0.127 rows=111 loops=1)
Index Cond: ((n >= '123'::text) AND (n < '124'::text))
Total runtime: 0.153 ms
In your case this solution is probably better.
Upvotes: 7
Reputation: 115550
I would try with
a compound index (data, compte_id)
on table assentaments
and
a compound index (empresa_id, nivell, codi_compte, id)
on table comptes
You should also turn that LEFT JOIN
into INNER JOIN
. The WHERE
conditions you have make them equivalent. Perhaps the query planner is not aware of it.
Another suspicion is the type of field comptes.codi_compte
. If it is integer
and not char()
, then the
WHERE c.codi_compte LIKE '6%'
is translated as:
WHERE CAST(c.codi_compte AS CHAR) LIKE '6%'
which means the index cannot be used. If that's the case, you can convert the field to char type.
Upvotes: 0
Reputation: 10206
There are a few things you could/should do. First:
SELECT SUM(a.deure)-SUM(a.haver) as Value
SUM()
will touch every row that matches... no way to INDEX
that operation.
FROM assentaments a, comptes c
When debugging queries, I find it easier to use a natural JOIN
instead of an explicit JOIN
. The query planner is freed up a bit more and often times makes a better choice. That's not the case here, just a general comment, however. Here's where there are likely mismatches between your INDEX
es and your query.
WHERE TRUE = TRUE
AND a.compte_id = c.id
AND c.empresa_id = 2
AND c.nivell = 11
Of those three queries, you have the following INDEX
:
CREATE INDEX "index_multiple" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST, empresa_id ASC NULLS LAST, nivell ASC NULLS LAST);
Break that apart since this isn't a UNIQUE INDEX
, you shouldn't see any change in the integrity of your data. The reason I'm suggesting this is because I'd guess that codi_compte
has a low cardinality. I'd guess that empresa_id
would have a higher cardinality. In general, create your INDEX
es from highest cardinality to lowest.
I suspect three INDEX
es will do a bitmap join or hash join faster. The crux of the problem is that PostgreSQL (probably correctly) thinks that doing an index_scan
is more expensive than doing a seq_scan
.
AND (a.data >='2007-01-01' AND a.data <='2007-01-31')
AND c.codi_compte LIKE '6%';
An INDEX
on a.data
could also be helpful because PostgreSQL would likely do an index_scan
on the date given depending on the number of rows in the assentaments
table.
CREATE INDEX "index_codi_compte" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST);
CREATE INDEX "index_comptes" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST);
I don't know why you have this INDEX
twice.
CREATE INDEX "index_multiple" ON "public"."comptes" USING btree(codi_compte ASC NULLS LAST, empresa_id ASC NULLS LAST, nivell ASC NULLS LAST);
As per above, break that INDEX
apart.
CREATE INDEX "index_nivell" ON "public"."comptes" USING btree(nivell ASC NULLS LAST);
That INDEX
is fine.
Quick tip:
SELECT matching, total, matching / total AS "Want this to be a small number"
FROM
(SELECT count(*)::FLOAT AS matching FROM tbl WHERE col_id = 1) AS matching,
(SELECT count(*)::FLOAT AS total FROM tbl) AS total;
matching rows | total rows | want this to be a small number
---------------+------------+--------------------------------
1 | 10 | 0.1
(1 row)
Where the third column ideally is equal to 1/total
.
Upvotes: 0
Reputation: 8514
This is most commonly due to the bad statistics on the index, i.e. if the index is not selective enough (for example, many repeating values), accessing and filtering on index can be even more time consuming than doing seq scan.
Are your values on c.codi_compte
selective enough? Maybe you have too many null values?
Upvotes: 0
Reputation: 86735
It appears that the DBMS is estimating that the JOIN on assentaments will be much more restrictive than filtering comptes, then joining.
Options could be...
1. Put an index on assentaments.compte_id
2. Alter your index on comptes
to be include id
as the first indexed field.
The first option may allow the execution plan to reverse: Filter comptes, then join to assentaments.
The second option may allow the execution plan to stay the same, but enable the use of the index.
Upvotes: 2