Reputation: 1515
I've never had to ask an SO question before, I always eventually stumble upon what I'm looking for through searching, but there's a lot going on here and I've gotten to the end of my searching prowess... Thanks for your help / suggestions.
I need to generate reports on the fly about how many 'students' in my organization have taken each 'assessment', and the standards associated with each 'assessment'.
I'm trying to do this in the most performant way that minimizes the load on Postgres but I don't believe I've achieved that goal...
Server: Amazon EC2 running Ubuntu
The App (Server): Node.js (0.10.26)
The App (Client): Angular.js (1.2.13)
Database (Queries): PostgreSQL (9.3.1)
Database (Cache): MongoDB (Hope to cache report in future)
assessments
_id int4
description text
category varchar(60)
sub_standards int4 ARRAY
...
assessments_report (pre-calculated sums)
assessment_id(FK) int4
client_id(FK) int4
students int4
completed int4
incomplete int4
...
sub_standards
_id
standard_id(FK) int4
description varchar(255)
...
standards
_id int4
name varchar(60)
description varchar(255)
...
//Stored as array for readability in Node-Postgres use
SELECT r.*, as.* FROM assessments_report r
INNER JOIN (
SELECT a._id AS assessment_id, a.description, a.category, a.states,
array_to_json(array_agg(ss.*)) AS standards
FROM assessments a LEFT JOIN (
SELECT ss.*, s.name AS parent_name, s.description AS parent_description
FROM sub_standards ss
INNER JOIN standards s ON ss.standard_id = s._id
) ss ON ss._id = ANY (a.sub_standards) GROUP BY a._id
) as
ON as.assessment_id = r.assessment_id
WHERE r.client_id = $1
assessment_id: 2,
students: 2,
complete: 1
incomplete: 1,
description: "...",
category: "...",
states: ["AL","AZ",...],
standards: [
{
_id: 1,
standard_id: 3,
description: "...",
parent_name: "...",
parent_description: "..."
},
{
_id: 2,
standard_id: 4,
description: "...",
parent_name: "...",
parent_description: "..."
},
]
Upvotes: 5
Views: 10769
Reputation: 17867
EXPLAIN
of your query produces a max cost of 537. You can take that down to 59 (factor 10 improvement) by just creating an index:
CREATE INDEX idx_sstandard_sid ON sub_standards (standard_id);
But this is even faster with a cost of 21:
WITH r AS (SELECT * FROM assessments_report WHERE client_id = $1)
, ss AS (SELECT r.*, a.description, a.category, a.states , a.sub_standards
FROM r INNER JOIN assessments a ON a._id = r.assessment_id)
, ss3 AS (SELECT ss2.*, s.name parent_name, s.description parent_description
FROM sub_standards ss2
INNER JOIN standards s
ON ss2.standard_id = s._id)
, ss1 AS (SELECT ss.*, array_to_json(array_agg(ss3.*)) standards
FROM ss
LEFT JOIN ss3
ON ss3._id = ANY (ss.sub_standards)
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14 )
SELECT _id, client_id, region_id, group_id, assessment_id, students, basic,
proficient, advanced, in_progress, description, category, states, standards
FROM ss1;
You could probably get rid of the grouping and go somewhat faster than this.
The basic idea is to start with the most limiting constraint (client_id), expand that by the next criteria (sub_standards) and never expand the recordset any more than necessary after that. Lookups against primary keys are faster than complex joins. Hope this helps.
Final query plan:
CTE Scan on ss1 (cost=21.79..21.81 rows=1 width=434)
CTE r
-> Seq Scan on assessments_report (cost=0.00..1.06 rows=1 width=40)
Filter: (client_id = 4)
CTE ss
-> Nested Loop (cost=0.14..8.19 rows=1 width=434)
-> CTE Scan on r (cost=0.00..0.02 rows=1 width=40)
-> Index Scan using assesments_pkey on assessments a (cost=0.14..8.16 rows=1 width=398)
Index Cond: (_id = r.assessment_id)
CTE ss3
-> Hash Join (cost=1.07..12.34 rows=3 width=1346)
Hash Cond: (s._id = ss2.standard_id)
-> Seq Scan on standards s (cost=0.00..10.90 rows=90 width=818)
-> Hash (cost=1.03..1.03 rows=3 width=532)
-> Seq Scan on sub_standards ss2 (cost=0.00..1.03 rows=3 width=532)
CTE ss1
-> HashAggregate (cost=0.19..0.20 rows=1 width=462)
-> Nested Loop Left Join (cost=0.00..0.15 rows=1 width=462)
Join Filter: (ss3._id = ANY (ss.sub_standards))
-> CTE Scan on ss (cost=0.00..0.02 rows=1 width=434)
-> CTE Scan on ss3 (cost=0.00..0.06 rows=3 width=32)
Upvotes: 3