Reputation: 23
I am using PostgreSQL for its array functionality. Here's my schema:
CREATE TABLE questions (
id INTEGER PRIMARY KEY,
product_id INTEGER UNIQUE NOT NULL,
body VARCHAR(1000) NOT NULL,
date_written DATE NOT NULL DEFAULT current_date,
asker_name VARCHAR(60) NOT NULL,
asker_email VARCHAR(60) NOT NULL,
reported BOOLEAN DEFAULT FALSE,
helpful INTEGER NOT NULL DEFAULT 0
);
CREATE TABLE answers (
id PRIMARY KEY NOT NULL,
question_id INTEGER NOT NULL,
body VARCHAR(1000) NOT NULL,
date_written DATE NOT NULL DEFAULT current_date,
answerer_name VARCHAR(60) NOT NULL,
answerer_email VARCHAR(60) NOT NULL,
reported BOOLEAN DEFAULT FALSE,
helpful INTEGER NOT NULL DEFAULT 0
);
CREATE TABLE photos (
id INTEGER UNIQUE,
answer_id INTEGER NOT NULL,
photo VARCHAR(200)
);
I am trying to query my answers table to get a list of all the answers for a given question id and include an array of all photos that exist for that given answer_id. The results should be sorted in descending order of helpfulness. So far, I have a massive query that displays the results I'm looking for, but the execution time is 729.595 ms. I am trying to optimize to get the query's time down to 200 ms. I have the following indexes to try and optimize my query times:
indexname | indexdef
-----------------+---------------------------------------------------------------------------
answer_id | CREATE UNIQUE INDEX answer_id ON public.answers USING btree (id)
question_id | CREATE INDEX question_id ON public.answers USING btree (question_id)
idx_reported_id | CREATE INDEX idx_reported_id ON public.answers USING btree (reported, id)
answers_pkey | CREATE UNIQUE INDEX answers_pkey ON public.answers USING btree (id)
indexname | indexdef
----------------+----------------------------------------------------------------------------
id | CREATE UNIQUE INDEX id ON public.questions USING btree (id)
idx_q_reported | CREATE INDEX idx_q_reported ON public.questions USING btree (id, reported)
questions_pkey | CREATE UNIQUE INDEX questions_pkey ON public.questions USING btree (id)
indexname | indexdef
---------------+---------------------------------------------------------------------
photos_id_key | CREATE UNIQUE INDEX photos_id_key ON public.photos USING btree (id)
p_links | CREATE INDEX p_links ON public.photos USING btree (photo)
In my analysis, I noticed that the GroupAggregate is time-consuming: GroupAggregate (cost=126222.21..126222.71 rows=25 width=129) (actual time=729.497..729.506 rows=5 loops=1) Group Key: answers.id
Is there a way I can avoid the time-consuming GROUP BY? Am I missing something with the indexes? Here is the query itself:
SELECT answers.id,
question_id,
body,
date_written,
answerer_name,
answerer_email,
reported,
helpful,
ARRAY_AGG(photo) as photos
FROM answers
LEFT JOIN photos ON answers.id = photos.answer_id
WHERE reported IS
false AND answers.id IN (SELECT id
FROM answers
WHERE question_id = 20012)
GROUP BY answers.id
ORDER BY helpful DESC;
Thanks!
Upvotes: 1
Views: 802
Reputation:
One way that often works, is to aggregate first, then join on the result (rather than aggregating the full result). And you don't really need the IN condition either
SELECT a.id,
a.question_id,
a.body,
a.date_written,
a.answerer_name,
a.answerer_email,
a.reported,
a.helpful,
p.photos
FROM answers a
LEFT JOIN (
select answer_id, array_agg(photo) as photos
from photos
group by answer_id
) p ON a.id = p.answer_id
WHERE reported IS false
AND a.question_id = 20012
ORDER BY a.helpful DESC;
Upvotes: 1
Reputation: 2039
I think you can skip the subquery:
SELECT answers.id, question_id, body, date_written, answerer_name, answerer_email, reported, helpful, ARRAY_AGG(photo) as photos
FROM answers
LEFT JOIN photos ON answers.id = photos.answer_id
WHERE reported IS false AND question_id = 20012
GROUP BY answers.id, question_id, body, date_written, answerer_name, answerer_email, reported, helpful
ORDER BY helpful DESC;
You can add a btree index on photos.answer_id because this field is use in the join clause.
You losed same fields on the GROUP BY clause;
Upvotes: 1