Can I speed up this subquery nested PostgreSQL Query

Question

I have the following PostgreSQL code (which works, but slowly) which I'm using to create a materialized view, however it is quite slow and length of code seems cumbersome with the multiple sub-queries. Is there anyway I can improve the speed this code executes at or rewrite so it's shorter and easier to maintain?

CREATE MATERIALIZED VIEW station_views.obs_10_min_avg_ffdi_powerbi AS 
 SELECT t.station_num,
    initcap(t.station_name) AS station_name,
    t.day,
    t.month_int,
    to_char(to_timestamp(t.month_int::text, 'MM'), 'TMMonth') AS Month,
    round(((date_part('year', age(t2.dmax, t2.dmin)) * 12 + date_part('month', age(t2.dmax, t2.dmin))) / 12)::numeric, 1) AS record_years,
    round((t2.count_all_vals / t2.max_10_periods * 100)::numeric, 1) AS per_datset,
    max(t.avg_bom_fdi) AS max,
    avg(t.avg_bom_fdi) AS avg,
    percentile_cont(0.95) WITHIN GROUP (ORDER BY t.avg_bom_fdi) AS percentile_cont_95,
    percentile_cont(0.99) WITHIN GROUP (ORDER BY t.avg_bom_fdi) AS percentile_cont_99
   FROM ( SELECT a.station_num,
            d.station_name,
            a.ten_minute_intervals_utc,
            date_part('day', a.ten_minute_intervals_utc) AS day,
            date_part('month', a.ten_minute_intervals_utc) AS month_int,
            a.avg_bom_fdi
           FROM analysis.obs_10_min_avg_ffdi_bom a,
            obs_minute_stn_det d
          WHERE d.station_num = a.station_num) t,
    ( SELECT obs_10_min_avg_ffdi_bom_view.station_num,
            obs_10_min_avg_ffdi_bom_view.station_name,
            min(obs_10_min_avg_ffdi_bom_view.ten_minute_intervals_utc) AS dmin,
            max(obs_10_min_avg_ffdi_bom_view.ten_minute_intervals_utc) AS dmax,
            date_part('epoch', max(obs_10_min_avg_ffdi_bom_view.ten_minute_intervals_utc) - min(obs_10_min_avg_ffdi_bom_view.ten_minute_intervals_utc)) / 600 AS max_10_periods,
            count(*) AS count_all_vals
           FROM analysis.obs_10_min_avg_ffdi_bom_view
          GROUP BY obs_10_min_avg_ffdi_bom_view.station_num, obs_10_min_avg_ffdi_bom_view.station_name) t2
  WHERE t.station_num = t2.station_num
  GROUP BY t.station_num, t.station_name, Month, t.month_int, t.day, record_years, per_datset
  ORDER BY t.month_int, t.day
WITH DATA;

The output I get is a row for each weather station (station_num & station_name) along with the day & month that a weather variable is recorded (avg_bom_fdi). The month value is retained and converted to a name for purposes of plotting values averaged per month on the chart. I also pull in the total number of years that recordings exist for that station (record_years) and a percentage of how complete that dataset is (per_datset). These both come from the second subquery (t2). The first subquery (t) is used to average the data per day and return the daily max, average and 95/99th percentiles.

Ankur Srivastava · Accepted Answer

I agree with the running the explain plan / execution plan on this query.
Also , if not needed remove order by
If you see , lot of time spent on fetching a particular value while reviewing execution plan, try creating an index on that particular column.
Depending on high and low cardinality , you can create B-Tree or Bit Map index,if you are deciding on index.

Can I speed up this subquery nested PostgreSQL Query

Answers (2)

Related Questions