Query taking too long

Question

I'm working on PostgreSQL (latest version), in a simple SCADA system. The problem is that every query involving the largest table in my schema [figure 1] takes about 2 hours.

What I want is to obtain the constructora's PKs for buildings (edificio) whose departments (departamento) record (valor in table medicion) surpluses in the measurement of "gas" or "electricidad" in 2010. So...

variable contains the max value for a record (valmax)
medicion contains the value of the metric (valor)

Tables' Size (in rows).

constructura -> 10
edificio -> 100
departamento -> 50.000
variable -> 8
medicion_departamento -> 400.000
medicion -> 8.000.000

Query

I've made a query with INNER JOINs that goes from the littlest tables (constructora and variable) to the biggest one (medicion).

SELECT DISTINCT C.id_constructora FROM
constructora C
INNER JOIN variable V ON (V.nombre = 'electricidad' OR V.nombre = 'gas')
INNER JOIN edificio E ON (E.id_constructora = C.id_constructora)
INNER JOIN departamento D ON (E.id_edificio = D.id_edificio)
INNER JOIN medicion M ON (M.id_variable = V.id_variable)
WHERE (
    (M.valor > V.valmax) AND
    EXTRACT(YEAR FROM M.fecha) = 2010
);

EXPLAIN

"HashAggregate  (cost=2343438.58..2343438.68 rows=10 width=4)"
"  Group Key: c.id_constructora"
"  ->  Hash Join  (cost=164536.25..1947605.25 rows=158333333 width=4)"
"        Hash Cond: (e.id_constructora = c.id_constructora)"
"        ->  Hash Join  (cost=4.25..1510.75 rows=50000 width=4)"
"              Hash Cond: (d.id_edificio = e.id_edificio)"
"              ->  Seq Scan on departamento d  (cost=0.00..819.00 rows=50000 width=4)"
"              ->  Hash  (cost=3.00..3.00 rows=100 width=8)"
"                    ->  Seq Scan on edificio e  (cost=0.00..3.00 rows=100 width=8)"
"        ->  Hash  (cost=164136.12..164136.12 rows=31670 width=4)"
"              ->  Nested Loop  (cost=0.00..164136.12 rows=31670 width=4)"
"                    ->  Nested Loop  (cost=0.00..163739.12 rows=3167 width=0)"
"                          Join Filter: ((m.valor > v.valmax) AND (v.id_variable = m.id_variable))"
"                          ->  Seq Scan on medicion m  (cost=0.00..162408.00 rows=38000 width=8)"
"                                Filter: (date_part('year'::text, fecha) = '2010'::double precision)"
"                          ->  Materialize  (cost=0.00..1.13 rows=2 width=24)"
"                                ->  Seq Scan on variable v  (cost=0.00..1.12 rows=2 width=24)"
"                                      Filter: (((nombre)::text = 'electricidad'::text) OR ((nombre)::text = 'gas'::text))"
"                    ->  Materialize  (cost=0.00..1.15 rows=10 width=4)"
"                          ->  Seq Scan on constructora c  (cost=0.00..1.10 rows=10 width=4)"

My question is what can I do to reduce -significantly- the execution time of my query?

Abelisto · Accepted Answer

It seems (according to the schema) your query should be:

SELECT DISTINCT C.id_constructora FROM
constructora C
  INNER JOIN edificio E ON (E.id_constructora = C.id_constructora)
  INNER JOIN departamento D ON (E.id_edificio = D.id_edificio)
  -- Changes here
  INNER JOIN medicion M ON (D.id_departamento = M.id_departamento)
  INNER JOIN variable V ON (M.id_variable = V.id_variable AND (V.nombre = 'electricidad' OR V.nombre = 'gas'))
WHERE (
    (M.valor > V.valmax) AND
    EXTRACT(YEAR FROM M.fecha) = 2010
);

Because in your original query you have the cross join of medicion and variable tables with all other tables.

Also you should to have the functional index like

create index idx_medicion_fecha_year on medicion(EXTRACT(YEAR FROM M.fecha));

to be used for EXTRACT(YEAR FROM fecha) = 2010 condition.

Query taking too long

Tables' Size (in rows).

Query

EXPLAIN

Answers (2)

Related Questions