Reputation: 1514
I'm working on PostgreSQL (latest version), in a simple SCADA system. The problem is that every query involving the largest table in my schema [figure 1] takes about 2 hours.
What I want is to obtain the constructora's PKs for buildings (edificio
) whose departments (departamento
) record (valor
in table medicion
) surpluses in the measurement of "gas" or "electricidad" in 2010. So...
variable
contains the max value for a record (valmax
)medicion
contains the value of the metric (valor
)constructura -> 10
edificio -> 100
departamento -> 50.000
variable -> 8
medicion_departamento -> 400.000
medicion -> 8.000.000
I've made a query with INNER JOINs
that goes from the littlest tables (constructora
and variable
) to the biggest one (medicion
).
SELECT DISTINCT C.id_constructora FROM
constructora C
INNER JOIN variable V ON (V.nombre = 'electricidad' OR V.nombre = 'gas')
INNER JOIN edificio E ON (E.id_constructora = C.id_constructora)
INNER JOIN departamento D ON (E.id_edificio = D.id_edificio)
INNER JOIN medicion M ON (M.id_variable = V.id_variable)
WHERE (
(M.valor > V.valmax) AND
EXTRACT(YEAR FROM M.fecha) = 2010
);
"HashAggregate (cost=2343438.58..2343438.68 rows=10 width=4)"
" Group Key: c.id_constructora"
" -> Hash Join (cost=164536.25..1947605.25 rows=158333333 width=4)"
" Hash Cond: (e.id_constructora = c.id_constructora)"
" -> Hash Join (cost=4.25..1510.75 rows=50000 width=4)"
" Hash Cond: (d.id_edificio = e.id_edificio)"
" -> Seq Scan on departamento d (cost=0.00..819.00 rows=50000 width=4)"
" -> Hash (cost=3.00..3.00 rows=100 width=8)"
" -> Seq Scan on edificio e (cost=0.00..3.00 rows=100 width=8)"
" -> Hash (cost=164136.12..164136.12 rows=31670 width=4)"
" -> Nested Loop (cost=0.00..164136.12 rows=31670 width=4)"
" -> Nested Loop (cost=0.00..163739.12 rows=3167 width=0)"
" Join Filter: ((m.valor > v.valmax) AND (v.id_variable = m.id_variable))"
" -> Seq Scan on medicion m (cost=0.00..162408.00 rows=38000 width=8)"
" Filter: (date_part('year'::text, fecha) = '2010'::double precision)"
" -> Materialize (cost=0.00..1.13 rows=2 width=24)"
" -> Seq Scan on variable v (cost=0.00..1.12 rows=2 width=24)"
" Filter: (((nombre)::text = 'electricidad'::text) OR ((nombre)::text = 'gas'::text))"
" -> Materialize (cost=0.00..1.15 rows=10 width=4)"
" -> Seq Scan on constructora c (cost=0.00..1.10 rows=10 width=4)"
My question is what can I do to reduce -significantly- the execution time of my query?
Upvotes: 0
Views: 52
Reputation: 1270873
The or
in the join
is a performance killer. Perhaps this does what you want:
SELECT DISTINCT C.id_constructora
FROM constructora C INNER JOIN
edificio E
ON E.id_constructora = C.id_constructora INNER JOIN
departamento D
ON E.id_edificio = D.id_edificio LEFT JOIN
variable ve
ON ve.nombre = 'electricidad' AND M.valor > ve.valmax LEFT JOIN
variable vg
ON vg.nombre = 'gas' LEFT JOIN
medicion me
ON me.id_variable = ve.id_variable AND
EXTRACT(YEAR FROM me.fecha) = 2010 AND
me.valor > ve.valmax LEFT JOIN
medicion mg
ON mg.id_variable = vg.id_variable AND
EXTRACT(YEAR FROM mg.fecha) = 2010 AND
mg.valor > vg.valmax
WHERE EXTRACT(YEAR FROM M.fecha) = 2010 AND
(me.id_variable IS NOT NULL OR mg.id_variable IS NOT NULL);
I also suspect that you want join conditions on mg.id_departamento = d.id_departamento
and me.id_departamento = d.id_departamento
in the appropriate on
clauses.
This assumes that you have appropriate indexes on the join keys (particularly the primary keys). In addition, you want an index on variable(nombre, valor)
and medicion(id_variable, fecha, valor)
. If you are using id_departamento
, it should be the first key in the index.
Upvotes: 1
Reputation: 15624
It seems (according to the schema) your query should be:
SELECT DISTINCT C.id_constructora FROM
constructora C
INNER JOIN edificio E ON (E.id_constructora = C.id_constructora)
INNER JOIN departamento D ON (E.id_edificio = D.id_edificio)
-- Changes here
INNER JOIN medicion M ON (D.id_departamento = M.id_departamento)
INNER JOIN variable V ON (M.id_variable = V.id_variable AND (V.nombre = 'electricidad' OR V.nombre = 'gas'))
WHERE (
(M.valor > V.valmax) AND
EXTRACT(YEAR FROM M.fecha) = 2010
);
Because in your original query you have the cross join of medicion
and variable
tables with all other tables.
Also you should to have the functional index like
create index idx_medicion_fecha_year on medicion(EXTRACT(YEAR FROM M.fecha));
to be used for EXTRACT(YEAR FROM fecha) = 2010
condition.
Upvotes: 1