Reputation: 489
I've written the following PostgreSQL query which works as it should. However, it seems to be awfully slow, sometimes taking up to 10 seconds to return a result. I'm sure there is something in my statement that is causing this to be slow.
Can anyone help determine why this query is slow?
SELECT DISTINCT ON (school_classes.class_id,attendance_calendar.school_date)
school_classes.class_id, school_classes.class_name, school_classes.grade_id
, school_gradelevels.linked_calendar, attendance_calendars.calendar_id
, attendance_calendar.school_date, attendance_calendar.minutes
, teacher_join_classes_subjects.staff_id, staff.first_name, staff.last_name
FROM school_classes
INNER JOIN school_gradelevels ON school_gradelevels.id=school_classes.grade_id
INNER JOIN teacher_join_classes_subjects ON teacher_join_classes_subjects.class_id=school_classes.class_id
INNER JOIN staff ON staff.staff_id=teacher_join_classes_subjects.staff_id
INNER JOIN attendance_calendars ON attendance_calendars.title=school_gradelevels.linked_calendar
INNER JOIN attendance_calendar ON attendance_calendar.calendar_id=attendance_calendars.calendar_id
WHERE teacher_join_classes_subjects.syear='2013'
AND staff.syear='2013'
AND attendance_calendars.syear='2013'
AND teacher_join_classes_subjects.does_attendance='Y'
AND teacher_join_classes_subjects.subject_id IS NULL
AND attendance_calendar.school_date<CURRENT_DATE
AND attendance_calendar.school_date NOT IN (
SELECT com.school_date FROM attendance_completed com
WHERE com.class_id=school_classes.class_id
AND (com.period_id='101' AND attendance_calendar.minutes>='151' OR
com.period_id='95' AND attendance_calendar.minutes='150') )
I replaced the NOT IN
with the following:
AND NOT EXISTS (
SELECT com.school_date
FROM attendance_completed com
WHERE com.class_id=school_classes.class_id
AND com.school_date=attendance_calendar.school_date
AND (com.period_id='101' AND attendance_calendar.minutes>='151' OR
com.period_id='95' AND attendance_calendar.minutes='150') )
Result of EXPLAIN ANALYZE:
Unique (cost=2998.39..2998.41 rows=3 width=85) (actual time=10751.111..10751.118 rows=1 loops=1) -> Sort (cost=2998.39..2998.40 rows=3 width=85) (actual time=10751.110..10751.110 rows=2 loops=1) Sort Key: school_classes.class_id, attendance_calendar.school_date Sort Method: quicksort Memory: 25kB -> Hash Join (cost=2.03..2998.37 rows=3 width=85) (actual time=6409.471..10751.045 rows=2 loops=1) Hash Cond: ((teacher_join_classes_subjects.class_id = school_classes.class_id) AND (school_gradelevels.id = school_classes.grade_id)) Join Filter: (NOT (SubPlan 1)) -> Nested Loop (cost=0.00..120.69 rows=94 width=81) (actual time=2.468..1187.397 rows=26460 loops=1) Join Filter: (attendance_calendars.calendar_id = attendance_calendar.calendar_id) -> Nested Loop (cost=0.00..42.13 rows=1 width=70) (actual time=0.087..3.247 rows=735 loops=1) Join Filter: ((attendance_calendars.title)::text = (school_gradelevels.linked_calendar)::text) -> Nested Loop (cost=0.00..40.80 rows=1 width=277) (actual time=0.077..1.005 rows=245 loops=1) -> Nested Loop (cost=0.00..39.61 rows=1 width=27) (actual time=0.064..0.572 rows=49 loops=1) -> Seq Scan on teacher_join_classes_subjects (cost=0.00..10.48 rows=4 width=14) (actual time=0.022..0.143 rows=49 loops=1) Filter: ((subject_id IS NULL) AND (syear = 2013::numeric) AND ((does_attendance)::text = 'Y'::text)) -> Index Scan using staff_pkey on staff (cost=0.00..7.27 rows=1 width=20) (actual time=0.006..0.007 rows=1 loops=49) Index Cond: (staff.staff_id = teacher_join_classes_subjects.staff_id) Filter: (staff.syear = 2013::numeric) -> Seq Scan on attendance_calendars (cost=0.00..1.18 rows=1 width=250) (actual time=0.003..0.006 rows=5 loops=49) Filter: (attendance_calendars.syear = 2013::numeric) -> Seq Scan on school_gradelevels (cost=0.00..1.15 rows=15 width=11) (actual time=0.001..0.005 rows=15 loops=245) -> Seq Scan on attendance_calendar (cost=0.00..55.26 rows=1864 width=18) (actual time=0.003..1.129 rows=1824 loops=735) Filter: (attendance_calendar.school_date Hash (cost=1.41..1.41 rows=41 width=18) (actual time=0.040..0.040 rows=41 loops=1) -> Seq Scan on school_classes (cost=0.00..1.41 rows=41 width=18) (actual time=0.006..0.015 rows=41 loops=1) SubPlan 1 -> Seq Scan on attendance_completed com (cost=0.00..958.28 rows=5 width=4) (actual time=0.228..5.411 rows=17 loops=1764) Filter: ((class_id = $0) AND (((period_id = 101::numeric) AND ($1 >= 151::numeric)) OR ((period_id = 95::numeric) AND ($1 = 150::numeric))))
Upvotes: 3
Views: 771
Reputation: 656706
NOT EXISTS
is an excellent choice. Almost always better than NOT IN
. More details here.
I simplified your query a bit (which looks fine, generally):
SELECT DISTINCT ON (c.class_id, a.school_date)
c.class_id, c.class_name, c.grade_id
,g.linked_calendar, aa.calendar_id
,a.school_date, a.minutes
,t.staff_id, s.first_name, s.last_name
FROM school_classes c
JOIN teacher_join_classes_subjects t USING (class_id)
JOIN staff s USING (staff_id)
JOIN school_gradelevels g ON g.id = c.grade_id
JOIN attendance_calendars aa ON aa.title = g.linked_calendar
JOIN attendance_calendar a ON a.calendar_id = aa.calendar_id
WHERE t.syear = 2013
AND s.syear = 2013
AND aa.syear = 2013
AND t.does_attendance = 'Y' -- looks like it should be boolean!
AND t.subject_id IS NULL
AND a.school_date < CURRENT_DATE
AND NOT EXISTS (
SELECT 1
FROM attendance_completed x
WHERE x.class_id = c.class_id
AND x.school_date = a.school_date
AND (x.period_id = 101 AND a.minutes >= 151 OR -- actually numbers?
x.period_id = 95 AND a.minutes = 150)
)
ORDER BY c.class_id, a.school_date, ???
What seems to be missing is ORDER BY
which should accompany your DISTINCT ON
. Add more ORDER BY
items in place of ???
. If there are duplicates to pick from, you probably want to define which to pick.
Numeric literals don't need single quotes and boolean
values should be coded as such.
You may want to revisit the chapter about data types.
Upvotes: 2