Reputation: 64900
I have a simple Django site, using a PostgreSQL 9.3 database, with a single table storing user accounts (e.g. name, email, address, phone, active, etc). However, my user model is fairly large, and has around 2.6 million records. I noticed Django's admin was a little slow, so using django-debug-toolbar, I noticed that almost all queries ran in under 1 ms, except for:
SELECT COUNT(*) FROM "myapp_myuser" WHERE "myapp_myuser"."active" = true;
which took about 7000 ms. However, the active
column is indexed using Django's standard db_index=True
, which generates the index:
CREATE INDEX myapp_myuser_active
ON myapp_myuser
USING btree
(active);
Checking out the query with EXPLAIN
via:
EXPLAIN ANALYZE VERBOSE
SELECT COUNT(*) FROM "myapp_myuser" WHERE "myapp_myuser"."active" = true;
returns:
Aggregate (cost=109305.45..109305.46 rows=1 width=0) (actual time=7342.973..7342.974 rows=1 loops=1)
Output: count(*)
-> Seq Scan on public.myapp_myuser (cost=0.00..102638.16 rows=2666916 width=0) (actual time=0.035..4765.059 rows=2666337 loops=1)
Output: id, created, category_id, name, email, address_1, address_2, city, active, (...)
Filter: myapp_myuser.active
Total runtime: 7343.031 ms
It appears to not be using the index at all. Am I reading this right?
Running just SELECT COUNT(*) FROM "myapp_myuser"
completed in about 500 ms. Why such a disparity in run times, even though the only column being used is indexed?
How can I better optimize this query?
Upvotes: 0
Views: 502
Reputation: 95751
You're selecting a lot of columns out of a wide table. So this might not help, even though it does result in a bitmap index scan.
Try a partial index.
create index on myapp_myuser (active) where active = true;
I made a test table with a couple million rows.
explain analyze verbose
select count(*) from test where active = true;
"Aggregate (cost=41800.79..41800.81 rows=1 width=0) (actual time=500.756..500.756 rows=1 loops=1)"
" Output: count(*)"
" -> Bitmap Heap Scan on public.test (cost=8085.76..39307.79 rows=997200 width=0) (actual time=126.233..386.834 rows=1000000 loops=1)"
" Output: id, active"
" Filter: test.active"
" -> Bitmap Index Scan on test_active_idx1 (cost=0.00..7836.45 rows=497204 width=0) (actual time=123.398..123.398 rows=1000000 loops=1)"
" Index Cond: (test.active = true)"
"Total runtime: 500.794 ms"
When you write queries that you hope will use a partial index, you need to match the expression and WHERE clause. Using WHERE active is true
is valid in PostgreSQL, but it doesn't match the WHERE clause in the partial index. That means you'll get a sequential scan again.
Upvotes: 2