Reputation: 25
I have the following PostgreSQL code (which works, but slowly) which I'm using to create a materialized view, however it is quite slow and length of code seems cumbersome with the multiple sub-queries. Is there anyway I can improve the speed this code executes at or rewrite so it's shorter and easier to maintain?
CREATE MATERIALIZED VIEW station_views.obs_10_min_avg_ffdi_powerbi AS
SELECT t.station_num,
initcap(t.station_name) AS station_name,
t.day,
t.month_int,
to_char(to_timestamp(t.month_int::text, 'MM'), 'TMMonth') AS Month,
round(((date_part('year', age(t2.dmax, t2.dmin)) * 12 + date_part('month', age(t2.dmax, t2.dmin))) / 12)::numeric, 1) AS record_years,
round((t2.count_all_vals / t2.max_10_periods * 100)::numeric, 1) AS per_datset,
max(t.avg_bom_fdi) AS max,
avg(t.avg_bom_fdi) AS avg,
percentile_cont(0.95) WITHIN GROUP (ORDER BY t.avg_bom_fdi) AS percentile_cont_95,
percentile_cont(0.99) WITHIN GROUP (ORDER BY t.avg_bom_fdi) AS percentile_cont_99
FROM ( SELECT a.station_num,
d.station_name,
a.ten_minute_intervals_utc,
date_part('day', a.ten_minute_intervals_utc) AS day,
date_part('month', a.ten_minute_intervals_utc) AS month_int,
a.avg_bom_fdi
FROM analysis.obs_10_min_avg_ffdi_bom a,
obs_minute_stn_det d
WHERE d.station_num = a.station_num) t,
( SELECT obs_10_min_avg_ffdi_bom_view.station_num,
obs_10_min_avg_ffdi_bom_view.station_name,
min(obs_10_min_avg_ffdi_bom_view.ten_minute_intervals_utc) AS dmin,
max(obs_10_min_avg_ffdi_bom_view.ten_minute_intervals_utc) AS dmax,
date_part('epoch', max(obs_10_min_avg_ffdi_bom_view.ten_minute_intervals_utc) - min(obs_10_min_avg_ffdi_bom_view.ten_minute_intervals_utc)) / 600 AS max_10_periods,
count(*) AS count_all_vals
FROM analysis.obs_10_min_avg_ffdi_bom_view
GROUP BY obs_10_min_avg_ffdi_bom_view.station_num, obs_10_min_avg_ffdi_bom_view.station_name) t2
WHERE t.station_num = t2.station_num
GROUP BY t.station_num, t.station_name, Month, t.month_int, t.day, record_years, per_datset
ORDER BY t.month_int, t.day
WITH DATA;
The output I get is a row for each weather station (station_num & station_name) along with the day & month that a weather variable is recorded (avg_bom_fdi). The month value is retained and converted to a name for purposes of plotting values averaged per month on the chart. I also pull in the total number of years that recordings exist for that station (record_years) and a percentage of how complete that dataset is (per_datset). These both come from the second subquery (t2). The first subquery (t) is used to average the data per day and return the daily max, average and 95/99th percentiles.
Upvotes: 0
Views: 78
Reputation: 923
Upvotes: 0