Reputation: 3939
I have two tables:
User id | name ..
Pull Requests id | user_id | created_at |...
I need to fetch all users join them with the count of their pull requests of a particular year. So I wrote a query like so:
SELECT users.*, COUNT(pull_requests.id) as pull_requests_count
FROM "users" INNER JOIN
"pull_requests"
ON "pull_requests"."user_id" = "users"."id"
WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013)
GROUP BY users.id
I initially had indexes on,
pull_requests.user_id (btree). On doing explain I got this:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=18.93..18.96 rows=3 width=2775)
-> Hash Join (cost=14.13..18.92 rows=3 width=2775)
Hash Cond: (users.id = pull_requests.user_id)
-> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771)
-> Hash (cost=14.09..14.09 rows=3 width=8)
-> Bitmap Heap Scan on pull_requests (cost=4.28..14.09 rows=3 width=8)
Recheck Cond: (date_part('year'::text, created_at) = 2013::double precision)
-> Bitmap Index Scan on pull_req_extract_year_created_at_ix (cost=0.00..4.28 rows=3 width=0)
Index Cond: (date_part('year'::text, created_at) = 2013::double precision)
Then I added an index like this:
CREATE INDEX pull_req_extract_year_created_at_ix ON pull_requests (EXTRACT(year FROM created_at));
And now my explain is:
QUERY PLAN
--------------------------------------------------------------------------------------------
HashAggregate (cost=63.99..64.02 rows=3 width=2775)
-> Hash Join (cost=59.19..63.98 rows=3 width=2775)
Hash Cond: (users.id = pull_requests.user_id)
-> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771)
-> Hash (cost=59.16..59.16 rows=3 width=8)
-> Seq Scan on pull_requests (cost=0.00..59.16 rows=3 width=8)
Filter: (date_part('year'::text, created_at) = 2013::double precision)
Still I get 6.6 ms for 100 or so rows. How do I further optimize this?
Thanks!
Upvotes: 1
Views: 90
Reputation: 1269823
Try combining the two indexes into one:
CREATE INDEX pr_ix ON pull_requests(EXTRACT(year FROM created_at), user_id);
and then phrasing the query as:
SELECT users.*, pull_requests_count
FROM "users" INNER JOIN
(select user_id, count(*) as pull_requests_count
from "pull_requests"
WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013)
group by user_id
) pr
ON pr."user_id" = "users"."id";
The index completely covers the subquery, so the original table will not be needed, just an index scan. This can then be joined back to the users.
Upvotes: 1