Tomás Juárez
Tomás Juárez

Reputation: 1514

Query taking too long

I'm working on PostgreSQL (latest version), in a simple SCADA system. The problem is that every query involving the largest table in my schema [figure 1] takes about 2 hours.

Schema

What I want is to obtain the constructora's PKs for buildings (edificio) whose departments (departamento) record (valor in table medicion) surpluses in the measurement of "gas" or "electricidad" in 2010. So...

Tables' Size (in rows).

Query

I've made a query with INNER JOINs that goes from the littlest tables (constructora and variable) to the biggest one (medicion).

SELECT DISTINCT C.id_constructora FROM
constructora C
INNER JOIN variable V ON (V.nombre = 'electricidad' OR V.nombre = 'gas')
INNER JOIN edificio E ON (E.id_constructora = C.id_constructora)
INNER JOIN departamento D ON (E.id_edificio = D.id_edificio)
INNER JOIN medicion M ON (M.id_variable = V.id_variable)
WHERE (
    (M.valor > V.valmax) AND
    EXTRACT(YEAR FROM M.fecha) = 2010
);

EXPLAIN

"HashAggregate  (cost=2343438.58..2343438.68 rows=10 width=4)"
"  Group Key: c.id_constructora"
"  ->  Hash Join  (cost=164536.25..1947605.25 rows=158333333 width=4)"
"        Hash Cond: (e.id_constructora = c.id_constructora)"
"        ->  Hash Join  (cost=4.25..1510.75 rows=50000 width=4)"
"              Hash Cond: (d.id_edificio = e.id_edificio)"
"              ->  Seq Scan on departamento d  (cost=0.00..819.00 rows=50000 width=4)"
"              ->  Hash  (cost=3.00..3.00 rows=100 width=8)"
"                    ->  Seq Scan on edificio e  (cost=0.00..3.00 rows=100 width=8)"
"        ->  Hash  (cost=164136.12..164136.12 rows=31670 width=4)"
"              ->  Nested Loop  (cost=0.00..164136.12 rows=31670 width=4)"
"                    ->  Nested Loop  (cost=0.00..163739.12 rows=3167 width=0)"
"                          Join Filter: ((m.valor > v.valmax) AND (v.id_variable = m.id_variable))"
"                          ->  Seq Scan on medicion m  (cost=0.00..162408.00 rows=38000 width=8)"
"                                Filter: (date_part('year'::text, fecha) = '2010'::double precision)"
"                          ->  Materialize  (cost=0.00..1.13 rows=2 width=24)"
"                                ->  Seq Scan on variable v  (cost=0.00..1.12 rows=2 width=24)"
"                                      Filter: (((nombre)::text = 'electricidad'::text) OR ((nombre)::text = 'gas'::text))"
"                    ->  Materialize  (cost=0.00..1.15 rows=10 width=4)"
"                          ->  Seq Scan on constructora c  (cost=0.00..1.10 rows=10 width=4)"

My question is what can I do to reduce -significantly- the execution time of my query?

Upvotes: 0

Views: 52

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270873

The or in the join is a performance killer. Perhaps this does what you want:

SELECT DISTINCT C.id_constructora
FROM constructora C INNER JOIN
     edificio E
     ON E.id_constructora = C.id_constructora INNER JOIN
     departamento D
     ON E.id_edificio = D.id_edificio LEFT JOIN
     variable ve
     ON ve.nombre = 'electricidad' AND M.valor > ve.valmax LEFT JOIN
     variable vg
     ON vg.nombre = 'gas' LEFT JOIN
     medicion me
     ON me.id_variable = ve.id_variable AND
        EXTRACT(YEAR FROM me.fecha) = 2010 AND
         me.valor > ve.valmax LEFT JOIN
     medicion mg
     ON mg.id_variable = vg.id_variable AND
        EXTRACT(YEAR FROM mg.fecha) = 2010 AND
        mg.valor > vg.valmax
WHERE EXTRACT(YEAR FROM M.fecha) = 2010 AND
      (me.id_variable IS NOT NULL OR mg.id_variable IS NOT NULL);

I also suspect that you want join conditions on mg.id_departamento = d.id_departamento and me.id_departamento = d.id_departamento in the appropriate on clauses.

This assumes that you have appropriate indexes on the join keys (particularly the primary keys). In addition, you want an index on variable(nombre, valor) and medicion(id_variable, fecha, valor). If you are using id_departamento, it should be the first key in the index.

Upvotes: 1

Abelisto
Abelisto

Reputation: 15624

It seems (according to the schema) your query should be:

SELECT DISTINCT C.id_constructora FROM
constructora C
  INNER JOIN edificio E ON (E.id_constructora = C.id_constructora)
  INNER JOIN departamento D ON (E.id_edificio = D.id_edificio)
  -- Changes here
  INNER JOIN medicion M ON (D.id_departamento = M.id_departamento)
  INNER JOIN variable V ON (M.id_variable = V.id_variable AND (V.nombre = 'electricidad' OR V.nombre = 'gas'))
WHERE (
    (M.valor > V.valmax) AND
    EXTRACT(YEAR FROM M.fecha) = 2010
);

Because in your original query you have the cross join of medicion and variable tables with all other tables.

Also you should to have the functional index like

create index idx_medicion_fecha_year on medicion(EXTRACT(YEAR FROM M.fecha));

to be used for EXTRACT(YEAR FROM fecha) = 2010 condition.

Upvotes: 1

Related Questions