Reputation: 1722
I have the following query with multiple CTEs. Without the departure_dates and arrival_dates CTEs, the query finished in about 12sec. When I add in the departure_dates and arrival_dates CTEs the query takes so long that I have to cancel it. I can't run explain analyze bc of the same reason. With running explain, it says the costliest node is the joins at the end a.arrival_date = r.arrival_date
. From what I can see on the explain, it's using the indexes I have for the dates CTEs. So my question is, what would cause it to slow down so much with the addition of these two and is there any way I would be able to optimize/fix it?
WITH
relationships AS (
SELECT rt.from_location_code,
rt.to_location_code,
rt.departure_datetime AS departure_date,
rt.arrival_datetime AS arrival_date,
rt.ticketno,
rt.leg_number,
rt.leg_direction,
rt.service_direction,
rt.booking_reference,
fj.transaction_id,
fj.ticket_id,
fj.ticket_number,
rt.batch_id
FROM sch1.booked_travel rt
LEFT JOIN (
SELECT DISTINCT t.transaction_id, sdt.ticket_number, sdt.ticket_id
FROM sch2.ticket_transactions t
JOIN sch2.tickets sdt USING (ticket_id)
JOIN sch1.reserved_travels ert ON ert.ticketno = sdt.ticket_number
) fj ON fj.ticket_number = rt.ticketno
WHERE batch_id = 234
),
xaction AS (
SELECT DISTINCT t.transaction_id, sdt.ticket_number, sdt.ticket_id
FROM sch2.ticket_transactions t
JOIN sch2.tickets sdt USING (ticket_id)
JOIN relationships ert ON ert.ticketno = sdt.ticket_number
WHERE ert.batch_id = 234
),
from_location AS (
SELECT DISTINCT l.location_id AS from_location_id, rt.from_location_code
FROM sch2.locations l
JOIN relationships rt ON rt.from_location_code = l.code
WHERE rt.batch_id = 234
),
to_location AS (
SELECT DISTINCT d.location_id AS to_location_id, rt.to_location_code
FROM sch2.locations d
JOIN relationships rt ON rt.to_location_code = d.code
WHERE rt.batch_id = 234
),
departure_dates as (
select distinct on (d.date_id, d.the_date) d.date_id as departure_date_id, d.the_date as departure_date
from sch2.dates d
join relationships r on r.departure_date = d.the_date
where r.batch_id = 234
),
arrival_dates as (
select distinct on (d.date_id, d.the_date) d.date_id as arrival_date_id, d.the_date as arrival_date
from sch2.dates d
join relationships r on r.arrival_date = d.the_date
where r.batch_id = 234
)
INSERT
INTO sch2.travel_legs (transaction_id, ticket_id, leg_number, leg_direction,
from_location_id, to_location_id, service_direction,
departure_date_id, departure_date, arrival_date, arrival_date_id, booking_reference)
SELECT r.transaction_id,
r.ticket_id,
r.leg_number,
r.leg_direction,
f.from_location_id,
t.to_location_id,
r.service_direction,
d.departure_date_id,
r.departure_date,
r.arrival_date,
a.arrival_date_id,
r.booking_reference
FROM relationships r
LEFT JOIN xaction USING (transaction_id)
LEFT JOIN from_location f ON f.from_location_code = r.from_location_code
LEFT JOIN to_location t ON t.to_location_code = r.to_location_code
left join departure_dates d on r.departure_date = d.departure_date
left join arrival_dates a on a.arrival_date = r.arrival_date
RETURNING *;
Plan:
"Insert on sch2.travle_legs (cost=6500820.62..19312607.17 rows=425901878 width=678)"
" Output: travle_legs.journey_leg_id, travle_legs.transaction_id, travle_legs.customer_id, travle_legs.ticket_id, travle_legs.brand_id, travle_legs.leg_number, travle_legs.leg_direction, travle_legs.from_location_id, travle_legs.to_location_id, travle_legs.service_id, travle_legs.service_direction, travle_legs.departure_date, travle_legs.arrival_date, travle_legs.departure_date_id, travle_legs.arrival_date_id, travle_legs.journey_type, travle_legs.flight_id, travle_legs.booking_reference, travle_legs.scheduled_depature_date, travle_legs.scheduled_departure_date_id, travle_legs.actual_departure_date, travle_legs.actual_departure_date_id, travle_legs.scheduled_arrival_date, travle_legs.scheduled_arrival_date_id, travle_legs.actual_arrival_date, travle_legs.actual_arrival_date_id, travle_legs.departure_offset_minutes, travle_legs.departure_offset_time, travle_legs.departure_late, travle_legs.arrival_offset_minutes, travle_legs.arrival_offset_time, travle_legs.arrival_late"
" CTE relationships"
" -> Hash Right Join (cost=17457.33..769694.82 rows=73016 width=90)"
" Output: rt.from_location_code, rt.to_location_code, rt.departure_datetime, rt.arrival_datetime, COALESCE($0, '-1'::integer), (SubPlan 2), rt.ticketno, rt.leg_number, rt.leg_direction, COALESCE($2, '-1'::integer), rt.service_direction, $3, rt.booking_reference, fj.transaction_id, fj.ticket_id, fj.ticket_number, rt.sch1_batch_file_id"
" Hash Cond: ((fj.ticket_number)::text = (rt.ticketno)::text)"
" InitPlan 1 (returns $0)"
" -> Limit (cost=1.02..59.51 rows=1 width=4)"
" Output: c.service_id"
" -> Hash Join (cost=1.02..11112.93 rows=190 width=4)"
" Output: c.service_id"
" Inner Unique: true"
" Hash Cond: ((e.service_number)::text = (c.code)::text)"
" -> Seq Scan on sch1.booked_travels e (cost=0.00..10989.21 rows=45937 width=3)"
" Output: e.id, e.sale_id, e.ticketno, e.transaction_date, e.leg_number, e.leg_direction, e.from_location_code, e.to_location_code, e.company_brand_code, e.service_number, e.service_direction, e.departure_datetime, e.arrival_datetime, e.flight_code, e.booking_reference, e.sch1_batch_file_id, e.imported, e.created_at, e.updated_at, e.customer_id"
" Filter: (e.sch1_batch_file_id = 234)"
" -> Hash (cost=1.01..1.01 rows=1 width=72)"
" Output: c.service_id, c.code"
" -> Seq Scan on sch2.services c (cost=0.00..1.01 rows=1 width=72)"
" Output: c.service_id, c.code"
" InitPlan 3 (returns $2)"
" -> Function Scan on public.find_brand (cost=0.25..10.25 rows=1000 width=4)"
" Output: find_brand.brand_id"
" Function Call: find_brand('sch1.booked_travels'::regclass, 152, 'company_brand_code'::text)"
" InitPlan 4 (returns $3)"
" -> Limit (cost=1.02..219.88 rows=1 width=4)"
" Output: c_1.flight_id"
" -> Hash Join (cost=1.02..11163.01 rows=51 width=4)"
" Output: c_1.flight_id"
" Hash Cond: ((e_1.flight_code)::text = (c_1.flight_code)::text)"
" -> Seq Scan on sch1.booked_travels e_1 (cost=0.00..10989.21 rows=45937 width=3)"
" Output: e_1.id, e_1.sale_id, e_1.ticketno, e_1.transaction_date, e_1.leg_number, e_1.leg_direction, e_1.from_location_code, e_1.to_location_code, e_1.company_brand_code, e_1.service_number, e_1.service_direction, e_1.departure_datetime, e_1.arrival_datetime, e_1.flight_code, e_1.booking_reference, e_1.sch1_batch_file_id, e_1.imported, e_1.created_at, e_1.updated_at, e_1.customer_id"
" Filter: (e_1.sch1_batch_file_id = 234)"
" -> Hash (cost=1.01..1.01 rows=1 width=520)"
" Output: c_1.flight_id, c_1.flight_code"
" -> Seq Scan on sch2.flights c_1 (cost=0.00..1.01 rows=1 width=520)"
" Output: c_1.flight_id, c_1.flight_code"
" -> Subquery Scan on fj (cost=5110.26..6047.10 rows=46842 width=17)"
" Output: fj.transaction_id, fj.ticket_id, fj.ticket_number"
" -> HashAggregate (cost=5110.26..5578.68 rows=46842 width=17)"
" Output: t_1.transaction_id, sdt.ticket_number, sdt.ticket_id"
" Group Key: t_1.transaction_id, sdt.ticket_number, sdt.ticket_id"
" -> Hash Join (cost=2505.21..4758.94 rows=46842 width=17)"
" Output: t_1.transaction_id, sdt.ticket_number, sdt.ticket_id"
" Hash Cond: ((ert.ticketno)::text = (sdt.ticket_number)::text)"
" -> Index Only Scan using idx_booked_travels_batch_id_ticketno on sch1.booked_travels ert (cost=0.41..1613.47 rows=45937 width=9)"
" Output: ert.ticketno, ert.sch1_batch_file_id"
" -> Hash (cost=2059.58..2059.58 rows=35617 width=17)"
" Output: t_1.transaction_id, sdt.ticket_number, sdt.ticket_id"
" -> Hash Join (cost=1077.90..2059.58 rows=35617 width=17)"
" Output: t_1.transaction_id, sdt.ticket_number, sdt.ticket_id"
" Inner Unique: true"
" Hash Cond: (t_1.ticket_id = sdt.ticket_id)"
" -> Seq Scan on sch2.fact_ticket_transactions t_1 (cost=0.00..888.17 rows=35617 width=8)"
" Output: t_1.transaction_id, t_1.customer_id, t_1.ticket_id, t_1.agent_id, t_1.sale_date, t_1.sale_date_id, t_1.sales_channel, t_1.transaction_type, t_1.ticket_type, t_1.ticket_queue, t_1.fare_type, t_1.fare_value, t_1.adults, t_1.children, t_1.concessionary, t_1.young_person_coach_card, t_1.eurolines_young_person, t_1.hm_forces, t_1.infants, t_1.eurolines_seniors, t_1.over_sixties, t_1.coach_card_id, t_1.coach_card_consumer_type_id, t_1.account_id, t_1.distribution_type, t_1.ticket_email_id"
" -> Hash (cost=641.29..641.29 rows=34929 width=13)"
" Output: sdt.ticket_number, sdt.ticket_id"
" -> Seq Scan on sch2.tickets sdt (cost=0.00..641.29 rows=34929 width=13)"
" Output: sdt.ticket_number, sdt.ticket_id"
" -> Hash (cost=10989.21..10989.21 rows=45937 width=61)"
" Output: rt.from_location_code, rt.to_location_code, rt.departure_datetime, rt.arrival_datetime, rt.customer_id, rt.ticketno, rt.leg_number, rt.leg_direction, rt.service_direction, rt.booking_reference, rt.sch1_batch_file_id"
" -> Seq Scan on sch1.booked_travels rt (cost=0.00..10989.21 rows=45937 width=61)"
" Output: rt.from_location_code, rt.to_location_code, rt.departure_datetime, rt.arrival_datetime, rt.customer_id, rt.ticketno, rt.leg_number, rt.leg_direction, rt.service_direction, rt.booking_reference, rt.sch1_batch_file_id"
" Filter: (rt.sch1_batch_file_id = 234)"
" SubPlan 2"
" -> Function Scan on public.find_customer (cost=0.25..10.25 rows=1000 width=4)"
" Output: rt.customer_id"
" Function Call: find_customer('sch1.booked_travels'::regclass, 152)"
" CTE xaction"
" -> Unique (cost=2879.91..2883.63 rows=372 width=17)"
" Output: t_2.transaction_id, sdt_1.ticket_number, sdt_1.ticket_id"
" -> Sort (cost=2879.91..2880.84 rows=372 width=17)"
" Output: t_2.transaction_id, sdt_1.ticket_number, sdt_1.ticket_id"
" Sort Key: t_2.transaction_id, sdt_1.ticket_number, sdt_1.ticket_id"
" -> Nested Loop (cost=1078.19..2864.03 rows=372 width=17)"
" Output: t_2.transaction_id, sdt_1.ticket_number, sdt_1.ticket_id"
" -> Hash Join (cost=1077.90..2721.72 rows=365 width=13)"
" Output: sdt_1.ticket_number, sdt_1.ticket_id"
" Inner Unique: true"
" Hash Cond: ((ert_1.ticketno)::text = (sdt_1.ticket_number)::text)"
" -> CTE Scan on relationships ert_1 (cost=0.00..1642.86 rows=365 width=32)"
" Output: ert_1.from_location_code, ert_1.to_location_code, ert_1.departure_date, ert_1.arrival_date, ert_1.service_id, ert_1.customer_id, ert_1.ticketno, ert_1.leg_number, ert_1.leg_direction, ert_1.brand_id, ert_1.service_direction, ert_1.flight_id, ert_1.booking_reference, ert_1.transaction_id, ert_1.ticket_id, ert_1.ticket_number, ert_1.sch1_batch_file_id"
" Filter: (ert_1.sch1_batch_file_id = 234)"
" -> Hash (cost=641.29..641.29 rows=34929 width=13)"
" Output: sdt_1.ticket_number, sdt_1.ticket_id"
" -> Seq Scan on sch2.tickets sdt_1 (cost=0.00..641.29 rows=34929 width=13)"
" Output: sdt_1.ticket_number, sdt_1.ticket_id"
" -> Index Only Scan using idx_ticket_xaction_ticket_id_transaction_id on sch2.fact_ticket_transactions t_2 (cost=0.29..0.38 rows=1 width=8)"
" Output: t_2.ticket_id, t_2.transaction_id"
" Index Cond: (t_2.ticket_id = sdt_1.ticket_id)"
" CTE from_location"
" -> Unique (cost=1676.67..1679.41 rows=365 width=36)"
" Output: l.location_id, rt_1.from_location_code"
" -> Sort (cost=1676.67..1677.58 rows=365 width=36)"
" Output: l.location_id, rt_1.from_location_code"
" Sort Key: l.location_id, rt_1.from_location_code"
" -> Hash Join (cost=17.31..1661.14 rows=365 width=36)"
" Output: l.location_id, rt_1.from_location_code"
" Inner Unique: true"
" Hash Cond: ((rt_1.from_location_code)::text = (l.code)::text)"
" -> CTE Scan on relationships rt_1 (cost=0.00..1642.86 rows=365 width=32)"
" Output: rt_1.from_location_code, rt_1.to_location_code, rt_1.departure_date, rt_1.arrival_date, rt_1.service_id, rt_1.customer_id, rt_1.ticketno, rt_1.leg_number, rt_1.leg_direction, rt_1.brand_id, rt_1.service_direction, rt_1.flight_id, rt_1.booking_reference, rt_1.transaction_id, rt_1.ticket_id, rt_1.ticket_number, rt_1.sch1_batch_file_id"
" Filter: (rt_1.sch1_batch_file_id = 234)"
" -> Hash (cost=10.47..10.47 rows=547 width=10)"
" Output: l.location_id, l.code"
" -> Seq Scan on sch2.locations l (cost=0.00..10.47 rows=547 width=10)"
" Output: l.location_id, l.code"
" CTE to_location"
" -> Unique (cost=1676.67..1679.41 rows=365 width=36)"
" Output: d_1.location_id, rt_2.to_location_code"
" -> Sort (cost=1676.67..1677.58 rows=365 width=36)"
" Output: d_1.location_id, rt_2.to_location_code"
" Sort Key: d_1.location_id, rt_2.to_location_code"
" -> Hash Join (cost=17.31..1661.14 rows=365 width=36)"
" Output: d_1.location_id, rt_2.to_location_code"
" Inner Unique: true"
" Hash Cond: ((rt_2.to_location_code)::text = (d_1.code)::text)"
" -> CTE Scan on relationships rt_2 (cost=0.00..1642.86 rows=365 width=32)"
" Output: rt_2.from_location_code, rt_2.to_location_code, rt_2.departure_date, rt_2.arrival_date, rt_2.service_id, rt_2.customer_id, rt_2.ticketno, rt_2.leg_number, rt_2.leg_direction, rt_2.brand_id, rt_2.service_direction, rt_2.flight_id, rt_2.booking_reference, rt_2.transaction_id, rt_2.ticket_id, rt_2.ticket_number, rt_2.sch1_batch_file_id"
" Filter: (rt_2.sch1_batch_file_id = 234)"
" -> Hash (cost=10.47..10.47 rows=547 width=10)"
" Output: d_1.location_id, d_1.code"
" -> Seq Scan on sch2.locations d_1 (cost=0.00..10.47 rows=547 width=10)"
" Output: d_1.location_id, d_1.code"
" CTE departure_dates"
" -> Unique (cost=8103.16..8149.19 rows=6137 width=12)"
" Output: d_2.date_id, d_2.the_date"
" -> Sort (cost=8103.16..8118.51 rows=6137 width=12)"
" Output: d_2.date_id, d_2.the_date"
" Sort Key: d_2.date_id, d_2.the_date"
" -> Nested Loop (cost=0.42..7717.04 rows=6137 width=12)"
" Output: d_2.date_id, d_2.the_date"
" -> CTE Scan on relationships r_1 (cost=0.00..1642.86 rows=365 width=8)"
" Output: r_1.from_location_code, r_1.to_location_code, r_1.departure_date, r_1.arrival_date, r_1.service_id, r_1.customer_id, r_1.ticketno, r_1.leg_number, r_1.leg_direction, r_1.brand_id, r_1.service_direction, r_1.flight_id, r_1.booking_reference, r_1.transaction_id, r_1.ticket_id, r_1.ticket_number, r_1.sch1_batch_file_id"
" Filter: (r_1.sch1_batch_file_id = 234)"
" -> Index Scan using idx_lkup_dates_date on sch2.lkup_dates d_2 (cost=0.42..16.47 rows=17 width=12)"
" Output: d_2.date_id, d_2.the_date, d_2.date_iso, d_2.date_text, d_2.date_abbreviation, d_2.day_of_week_number, d_2.day_of_week_name, d_2.day_of_week_short_name, d_2.first_date_of_week, d_2.last_date_of_week, d_2.day_of_month, d_2.day_of_month_number, d_2.month_name, d_2.month_name_with_year, d_2.month_short_name, d_2.month_short_name_with_year, d_2.day_of_month_name, d_2.first_date_of_month, d_2.last_date_of_month, d_2.calendar_week_number, d_2.calendar_week_name, d_2.calendar_week_name_with_year, d_2.calendar_week_short_name, d_2.calendar_week_short_name_with_year, d_2.calendar_week_yy_wk, d_2.calendar_month_number, d_2.calendar_month_yy_mm, d_2.calendar_year_month"
" Index Cond: (d_2.the_date = r_1.departure_date)"
" CTE arrival_dates"
" -> Unique (cost=8103.16..8149.19 rows=6137 width=12)"
" Output: d_3.date_id, d_3.the_date"
" -> Sort (cost=8103.16..8118.51 rows=6137 width=12)"
" Output: d_3.date_id, d_3.the_date"
" Sort Key: d_3.date_id, d_3.the_date"
" -> Nested Loop (cost=0.42..7717.04 rows=6137 width=12)"
" Output: d_3.date_id, d_3.the_date"
" -> CTE Scan on relationships r_2 (cost=0.00..1642.86 rows=365 width=8)"
" Output: r_2.from_location_code, r_2.to_location_code, r_2.departure_date, r_2.arrival_date, r_2.service_id, r_2.customer_id, r_2.ticketno, r_2.leg_number, r_2.leg_direction, r_2.brand_id, r_2.service_direction, r_2.flight_id, r_2.booking_reference, r_2.transaction_id, r_2.ticket_id, r_2.ticket_number, r_2.sch1_batch_file_id"
" Filter: (r_2.sch1_batch_file_id = 234)"
" -> Index Scan using idx_lkup_dates_date on sch2.lkup_dates d_3 (cost=0.42..16.47 rows=17 width=12)"
" Output: d_3.date_id, d_3.the_date, d_3.date_iso, d_3.date_text, d_3.date_abbreviation, d_3.day_of_week_number, d_3.day_of_week_name, d_3.day_of_week_short_name, d_3.first_date_of_week, d_3.last_date_of_week, d_3.day_of_month, d_3.day_of_month_number, d_3.month_name, d_3.month_name_with_year, d_3.month_short_name, d_3.month_short_name_with_year, d_3.day_of_month_name, d_3.first_date_of_month, d_3.last_date_of_month, d_3.calendar_week_number, d_3.calendar_week_name, d_3.calendar_week_name_with_year, d_3.calendar_week_short_name, d_3.calendar_week_short_name_with_year, d_3.calendar_week_yy_wk, d_3.calendar_month_number, d_3.calendar_month_yy_mm, d_3.calendar_year_month"
" Index Cond: (d_3.the_date = r_2.arrival_date)"
" -> Merge Right Join (cost=5708584.97..18520371.52 rows=425901878 width=678)"
" Output: nextval('sch2.travle_legs_journey_leg_id_seq'::regclass), r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, f.from_location_id, t.to_location_id, r.service_id, r.service_direction, r.departure_date, r.arrival_date, d.departure_date_id, a.arrival_date_id, NULL::integer, r.flight_id, r.booking_reference, NULL::timestamp without time zone, NULL::integer, NULL::timestamp without time zone, NULL::integer, NULL::timestamp without time zone, NULL::integer, NULL::timestamp without time zone, NULL::integer, NULL::integer, NULL::integer, NULL::boolean, NULL::integer, NULL::integer, NULL::boolean"
" Merge Cond: (a.arrival_date = r.arrival_date)"
" -> Sort (cost=508.86..524.20 rows=6137 width=12)"
" Output: a.arrival_date_id, a.arrival_date"
" Sort Key: a.arrival_date"
" -> CTE Scan on arrival_dates a (cost=0.00..122.74 rows=6137 width=12)"
" Output: a.arrival_date_id, a.arrival_date"
" -> Materialize (cost=5708076.12..5777475.15 rows=13879807 width=180)"
" Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id, d.departure_date_id"
" -> Sort (cost=5708076.12..5742775.63 rows=13879807 width=180)"
" Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id, d.departure_date_id"
" Sort Key: r.arrival_date"
" -> Merge Right Join (cost=151743.89..361102.51 rows=13879807 width=180)"
" Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id, d.departure_date_id"
" Merge Cond: (d.departure_date = r.departure_date)"
" -> Sort (cost=508.86..524.20 rows=6137 width=12)"
" Output: d.departure_date_id, d.departure_date"
" Sort Key: d.departure_date"
" -> CTE Scan on departure_dates d (cost=0.00..122.74 rows=6137 width=12)"
" Output: d.departure_date_id, d.departure_date"
" -> Materialize (cost=151235.03..153496.69 rows=452332 width=176)"
" Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id"
" -> Sort (cost=151235.03..152365.86 rows=452332 width=176)"
" Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id"
" Sort Key: r.departure_date"
" -> Hash Left Join (cost=35.82..31437.17 rows=452332 width=176)"
" Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id"
" Hash Cond: (r.transaction_id = xaction.transaction_id)"
" -> Hash Left Join (cost=23.73..15046.30 rows=243189 width=176)"
" Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, f.from_location_id, t.to_location_id"
" Hash Cond: ((r.to_location_code)::text = (t.to_location_code)::text)"
" -> Hash Left Join (cost=11.86..6272.98 rows=133254 width=204)"
" Output: r.transaction_id, r.customer_id, r.ticket_id, r.brand_id, r.leg_number, r.leg_direction, r.service_id, r.service_direction, r.departure_date, r.arrival_date, r.flight_id, r.booking_reference, r.to_location_code, f.from_location_id"
" Hash Cond: ((r.from_location_code)::text = (f.from_location_code)::text)"
" -> CTE Scan on relationships r (cost=0.00..1460.32 rows=73016 width=232)"
" Output: r.from_location_code, r.to_location_code, r.departure_date, r.arrival_date, r.service_id, r.customer_id, r.ticketno, r.leg_number, r.leg_direction, r.brand_id, r.service_direction, r.flight_id, r.booking_reference, r.transaction_id, r.ticket_id, r.ticket_number, r.sch1_batch_file_id"
" -> Hash (cost=7.30..7.30 rows=365 width=36)"
" Output: f.from_location_id, f.from_location_code"
" -> CTE Scan on from_location f (cost=0.00..7.30 rows=365 width=36)"
" Output: f.from_location_id, f.from_location_code"
" -> Hash (cost=7.30..7.30 rows=365 width=36)"
" Output: t.to_location_id, t.to_location_code"
" -> CTE Scan on to_location t (cost=0.00..7.30 rows=365 width=36)"
" Output: t.to_location_id, t.to_location_code"
" -> Hash (cost=7.44..7.44 rows=372 width=4)"
" Output: xaction.transaction_id"
" -> CTE Scan on xaction (cost=0.00..7.44 rows=372 width=4)"
" Output: xaction.transaction_id" Output: xaction.transaction_id"
Upvotes: 1
Views: 1928
Reputation: 1297
It is difficult to say one definitive action you could take that will solve the runtime, but I will suggest a few things you can try to see if they improve the query's performance.
In the relationships CTE, you should be using a GROUP BY
clause to ensure that the records returned are unique. Because all of your subsequent CTEs join back to relationships, if relationships contains duplicate records, this can cause trouble for later CTEs depending on the join type that the query planner chooses.
Your query plan lists three nested loop joins; one in each of the following CTEs: xaction, departure_dates, and arrival_dates. I would check how many records are actually returned by the tables/CTEs that are involved in the nested loop joins (presumably sch2.lkup_dates, relationships, sch2.fact_ticket_transactions, and sch2.tickets). Oftentimes the query planner will select a nested loop join when it believes that the total number of records involved is small, but sometimes it is wildly incorrect in this assumption which can lead to exponential runtime increases depending on where the nested loop ends up in the execution plan.
This is more a matter of practice rather than a performance gain, but because you are filtering relationships to only include records that contain batch_id = 234
, you don't need to include similar WHERE
clauses in all subsequent CTEs that are already joining back to the relationships CTE.
If you find that the tables involved in the nested loop joins are very large, I would suggest creating a temporary table that contains the output of joining the tables prior to executing this query, to ensure that the nested loop does not appear in the query plan.
Edit to add Temp Table Example:
The general idea behind using a temporary table is that it forces a dataset to be constructed prior to the execution of the main body of a query, which will coerce the query planner into choosing a better plan:
DROP TABLE IF EXISTS temp_departure_dates;
CREATE TEMPORARY TABLE temp_departure_dates as (
SELECT d.date_id AS departure_date_id
, d.the_date AS departure_date
FROM sch2.dates d
JOIN sch1.booked_travel b ON b.departure_date = d.the_date
WHERE b.batch_id = 234
GROUP BY 1, 2
);
WITH relationships AS (
-- ...
)
SELECT r.transaction_id,
r.ticket_id,
r.leg_number,
r.leg_direction,
f.from_location_id,
t.to_location_id,
r.service_direction,
d.departure_date_id,
r.departure_date,
r.arrival_date,
a.arrival_date_id,
r.booking_reference
FROM relationships r
LEFT JOIN xaction USING (transaction_id)
LEFT JOIN from_location f ON f.from_location_code = r.from_location_code
LEFT JOIN to_location t ON t.to_location_code = r.to_location_code
left join temp_departure_dates d on r.departure_date = d.departure_date
left join arrival_dates a on a.arrival_date = r.arrival_date
RETURNING *;
Upvotes: 1