mugdiman
mugdiman

Reputation: 101

performance of a pgplsql stored procedure?

We have the following stored procedure that is recently performing very slow on larger amount of date in postgres db. Questions:

We parse a string in the nature of (the first number is the id of the row, the second is the state)

||2|0||3|1||4|0||

Is it better to parse, split the string and loop in a higher language like java? Can the loop be more efficient in Postgres? How are transactions handled in stored procedures? The whole function is one transaction? Probably we are doing to much write and delete operations on the db. The deletes also take too long. Can that be handled more efficient?

CREATE OR REPLACE FUNCTION verificaemitidos(entrada text, largo_mensaje integer)
  RETURNS character AS
$BODY$
    DECLARE                 
    texto_procesado text;       
    identificador bigint;
    estado_mensaje int;
    i int;
    existe_documento int;
    estado_documento text;
    rut numeric;
    tipo int;
    folio_doc numeric;
    otros_estados int;
    BEGIN
        --estado 1 insertado
        --estado 0 no insertado
        --mensaje id_documento|estado||id_documento|estado||


        i := 1;
        while (i <= largo_mensaje)
        loop            
            --Proceso el mensaje
            texto_procesado := split_part(entrada,'||', i) ;
            identificador := split_part(texto_procesado, '|', 1);   
            estado_mensaje := split_part(texto_procesado, '|', 2);              
            -- Se comienza a hacer la comparacion
            existe_documento := (select count (id) from uris_emitidos where id = identificador);
            select estado, emp_rut, tipo_doc, folio into estado_documento, rut, tipo, folio_doc from uris_emitidos where id = identificador;

            --si existe el documento            
            if (existe_documento > 0) then              
                --si el documento que se ingreso esta insertado
                if (estado_mensaje = 1) then
                    --si esta aceptado se eliminan todos los documentos con ese rut, tipo, folio
                    if (estado_documento = 'A') then
                        delete from uris_emitidos where folio = folio_doc and emp_rut = rut and tipo_doc = tipo;
                    end if;
                    --si esta aceptado con reparo se eliminan todos los documentos con ese rut, tipo, folio
                    if (estado_documento = 'B') then
                        delete from uris_emitidos where folio = folio_doc and emp_rut = rut and tipo_doc = tipo;
                    end if;
                    --si esta rechazado se elimina el rechazado y el publicado
                    if (estado_documento = 'R') then
                        delete from uris_emitidos where folio = folio_doc and emp_rut = rut and tipo_doc = tipo and estado in ('R', 'P');
                    end if;
                    --si esta publicado se elimina
                    if (estado_documento = 'P') then
                        delete from uris_emitidos where id = identificador;
                    end if;
                --si el documento que se ingreso no esta insertado              
                else
                    --si esta aceptado se actualiza para que el proceso lo re-encole
                    if (estado_documento = 'A') then 
                        update uris_emitidos set estado_envio = 0, cont = (cont + 1) where id = identificador;                      
                    end if;
                    --si esta aceptado con reparo se actualiza para que el proceso lo re-encole
                    if (estado_documento = 'B') then
                        update uris_emitidos set estado_envio = 0, cont = (cont + 1) where id = identificador;                      
                    end if;
                    --si esta rechazado se verifica que no existe un registro aceptado que se haya encolado o este en espera de encolar
                    if (estado_documento = 'R') then
                        otros_estados = (select count(id) from uris_emitidos ue where ue.folio = folio_doc and ue.emp_rut = rut and ue.tipo_doc = tipo and ue.estado in ('A', 'B'));
                        --si otros estados = 0 significa que el estado rechazado es el mejor estado que hay, por lo tanto se debe re-encolar
                        if (otros_estados = 0) then
                            update uris_emitidos set estado_envio = 0, cont = (cont + 1) where id = identificador;
                        end if;
                    end if;
                    --si esta rechazado se verifica que no existe un registro aceptado o rechazado que se haya encolado o este en espera de encolar
                    if (estado_documento = 'P') then
                        otros_estados = (select count(id) from uris_emitidos where folio = folio_doc and emp_rut = rut and tipo_doc = tipo and estado in ('A', 'B', 'R'));
                        --si otros estados = 0 significa que el estado rechazado es el mejor estado que hay, por lo tanto se debe re-encolar
                        if (otros_estados = 0) then
                            update uris_emitidos set estado_envio = 0, cont = (cont + 1) where id = identificador;
                        end if;
                    end if;

                end if;

            end if;

            i := i+1;
        end loop;
        return 'ok';


    END;
$BODY$
  LANGUAGE plpgsql VOLATILE;

Upvotes: 2

Views: 3344

Answers (4)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658947

Your posted answer can be improved and simplified:

CREATE OR REPLACE FUNCTION x.verificaemitidos3(_entrada text)
  RETURNS text AS
$BODY$
BEGIN
   --estado 1 insertado
   --estado 0 no insertado

   CREATE TEMP TABLE worklist ON COMMIT DROP AS
   SELECT split_part(t, '|', 1)::bigint AS identificador
         ,split_part(t, '|', 2)::integer AS estado_mensaje
         ,uri.emp_rut AS rut_emisor
         ,uri.tipo_doc
         ,uri.folio
         ,uri.estado
   FROM  (SELECT unnest(string_to_array(trim(_entrada::text, '|'), '||'))) a(t)
   JOIN   uris_emitidos uri ON uri.id = split_part(t, '|', 1)::bigint;

   -- ESTADO 1

   DELETE FROM uris_emitidos u
   USING  worklist w
   WHERE  w.estado_mensaje = 1
   AND   (
         (w.estado IN ('A', 'B')   -- CASOS 1+2
      OR  w.estado =   'R'         -- CASO 3
      AND u.estado IN ('R', 'P')
      )
      AND u.folio = w.folio
      AND u.emp_rut = w.rut_emisor
      AND u.tipo_doc =  w.tipo_doc

      OR (w.estado = 'P'           -- CASO 4
      AND w.identificador = u.id
      )
      );

   -- ESTADO 0

   UPDATE uris_emitidos u
   SET    estado_envio = 0
         ,cont = cont + 1
   FROM   worklist w
   WHERE  w.estado_mensaje = 0
   AND    w.identificador = u.id
   AND   (w.estado IN ('A', 'B')   -- CASOS 1+2

      OR  w.estado = 'R'           -- CASO 3
      AND NOT EXISTS (
         SELECT 1
         FROM   uris_emitidos ue
         WHERE  ue.folio = w.folio
         AND    ue.emp_rut = w.rut_emisor
         AND    ue.tipo_doc = w.tipo_doc
         AND    ue.estado IN ('A', 'B')
         )

      OR  w.estado = 'P'         -- CASO 4
      AND NOT EXISTS (
         SELECT 1
         FROM   uris_emitidos ue
         WHERE  ue.folio = w.folio
         AND    ue.emp_rut = w.rut_emisor
         AND    ue.tipo_doc = w.tipo_doc
         AND    ue.estado IN ('A', 'B', 'R')
         )
      );

   RETURN 'ok';
END;
$BODY$  LANGUAGE plpgsql VOLATILE;

Major points

  • Half as long and twice as fast.
  • Remove 2nd function parameter. You don't need it any more.
  • Remove all variables. None of them are used any more.
  • Use ON COMMIT DROP to avoid conflict with existing temp table on repeated calls in one session. In Postgres 9.1 you could use CREATE TABLE IF EXISTS
  • Create temp table directly from results of SELECT with CREATE TABLE AS.
  • Combine all DELETEs.
  • Combine all UPDATEs.
  • Trim some noise.
  • Use return type text instead of character.

Upvotes: 0

mugdiman
mugdiman

Reputation: 101

The solution I come up with in the moment is the following. Definitely faster than looping though and reducing the number of writes and reads on the db. Thanks

CREATE OR REPLACE FUNCTION verificaemitidos2(entrada text, largo_mensaje integer)
  RETURNS character AS
$BODY$
    DECLARE
    texto_procesado text;
    identificador bigint;
    estado_mensaje int;
    i int;
    existe_documento int;
    estado_documento text;
    rut numeric;
    tipo int;
    folio_doc numeric;
    otros_estados int;
    BEGIN
        --estado 1 insertado
        --estado 0 no insertado
        --mensaje id_documento|estado||id_documento|estado||

        --DROP TABLE worklist;
        CREATE TEMP TABLE worklist
            ( identificador bigint,
              estado_mensaje int,
              rut_emisor numeric,
              tipo_doc numeric,
              folio numeric,
              estado text
            );

        INSERT INTO worklist (identificador, estado_mensaje, rut_emisor, tipo_doc, folio, estado)
            SELECT split_part(t, '|', 1)::bigint ,
              split_part(t, '|', 2)::integer ,
              uri.emp_rut,
              uri.tipo_doc,
              uri.folio,
              uri.estado
              from (SELECT unnest(string_to_array(trim(entrada::text, '|'), '||'))) as a(t),
              uris_emitidos uri
              WHERE uri.id = split_part(t, '|', 1)::bigint;

        -- ESTADO 1
        -- ACEPTADOS PRIMEROS DOS CASOS

        DELETE FROM uris_emitidos u
         USING  worklist wl
         WHERE  wl.estado_mensaje = 1
           AND  wl.estado IN ('A', 'B')
           AND  u.folio = wl.folio
           AND  u.emp_rut = wl.rut_emisor
           AND  u.tipo_doc =  wl.tipo_doc;

        -- ESTADO 1
        -- CASO 3

        --delete from uris_emitidos where folio = folio_doc and emp_rut = rut and tipo_doc = tipo and estado in ('R', 'P');

        DELETE FROM uris_emitidos u
         USING  worklist wl
         WHERE  wl.estado_mensaje = 1
           AND  wl.estado IN ('R')
           AND  u.estado IN ('R', 'P')
           AND  u.folio = wl.folio
           AND  u.emp_rut = wl.rut_emisor
           AND  u.tipo_doc =  wl.tipo_doc;

        -- ESTADO 1
        -- CASO 4

         DELETE FROM uris_emitidos u
         USING  worklist wl
         WHERE  u.id = wl.identificador
           AND  wl.estado_mensaje = 1
           AND  wl.estado = 'P';

        -- ESTADO 0
        -- CASOS 1+2

        UPDATE uris_emitidos u
        SET estado_envio = 0, cont =  (u.cont + 1)
        FROM worklist wl
        WHERE  u.id = wl.identificador
        AND  wl.estado_mensaje = 0
        AND  wl.estado IN ('A' , 'B');

         -- update uris_emitidos set estado_envio = 0, cont = (cont + 1) where id = identificador;

        -- ESTADO 0
        -- CASO 3

        UPDATE uris_emitidos u
        SET estado_envio = 0, cont =  (u.cont + 1)
        FROM worklist wl
        WHERE  u.id = wl.identificador
        AND  wl.estado_mensaje = 0
        AND  wl.estado IN ('R')
        AND NOT EXISTS (
        SELECT 1 FROM uris_emitidos ue
        WHERE ue.folio = wl.folio
        AND ue.emp_rut = wl.rut_emisor
        AND ue.tipo_doc = wl.tipo_doc
        AND ue.estado IN ('A', 'B'));

        -- ESTADO 0
        -- CASO 4

        UPDATE uris_emitidos u
        SET estado_envio = 0, cont =  (u.cont + 1)
        FROM worklist wl
        WHERE  u.id = wl.identificador
        AND  wl.estado_mensaje = 0
        AND  wl.estado IN ('P')
        AND NOT EXISTS (
        SELECT 1 FROM uris_emitidos ue
        WHERE ue.folio = wl.folio
        AND ue.emp_rut = wl.rut_emisor
        AND ue.tipo_doc = wl.tipo_doc
        AND ue.estado IN ('A', 'B', 'R'));

        DROP TABLE worklist;

        RETURN 'ok';
    END;
$BODY$
  LANGUAGE plpgsql VOLATILE;

Upvotes: 0

wildplasser
wildplasser

Reputation: 44250

When you look at it, the arguments (except largo_mensaje) to the dreaded function could be viewed as the fields of a worklist:

CREATE TABLE worklist
    ( texto_procesado text
    , identificador bigint
    , estado_mensaje int
    );

And the corresponding work could be performed like ( I borrowed this from Erwin's answer):

DELETE FROM uris_emitidos u
 USING  worklist wl
 WHERE  u.id = wl.identificador
   AND    u.estado IN ('A','B','R','P')
   AND    wl.estado_mensaje = 1
   AND    wl.texto_procesado IN ('action1' , 'action2', ...)
    ;

, and afterwards the worklist has to been cleaned up ( AND NOT EXISTS (SELECT * FROM uris_emitidos WHERE ));

Upvotes: 0

Erwin Brandstetter
Erwin Brandstetter

Reputation: 658947

Can the loop be more efficient in pgsql?

As @wildplasser mentioned, it is generally much faster to run SQL statements that manipulate sets of rows than to manipulate each row individually. Loops are only possible in plpgsql (or other procedural language functions or, in a limited fashion, in recursive CTEs), not in plain SQL. They do their job just fine, but are not the strong suit of PostgreSQL.

How are transactions handled in stored procedures? The whole function is one transaction?

Yes, the whole function runs as one transaction. It can be part of a bigger transaction, but it cannot be split up.

Read more about how plpgsql functions work in this related answer on dba.SE.

Is it better to parse, split the string and loop in a higher language like java?

If the string isn't huge (many thousand elements), it really doesn't matter, as long as your logic is sound. It's not the string parsing that slows you down. It's the "row at a time" manipulation of rows in the table.

The much faster alternative would be do it all in one or few SQL statements. I would use data modifying CTEs for this (introduced with PostgreSQL 9.1): Parse the string once, and run DML statements on this internal working table.

Consider the following demo (untested):

WITH a(t) AS (  -- split string into rows
    SELECT unnest(string_to_array(trim('||2|0||3|1||4|0||'::text, '|'), '||'))
    )
    , b AS (    -- split record into columns per row
    SELECT split_part(t, '|', 1) AS identificador 
          ,split_part(t, '|', 2) AS estado_mensaje 
    FROM   a
    )
    , c AS (    -- implements complete IF branch of your base loop
    DELETE FROM uris_emitidos u
    USING  b
    WHERE  u.id = b.identificador
    AND    u.estado IN ('A','B','R','P')
    AND    b.estado_mensaje = 1
    )
--  , d AS (    -- implements ELSE branch of your loop
--  DELETE ...
--  )
SELECT 'ok':

Adding to the principal design flaws, the logic in your loops is redundant and inconsistent. I consolidated the whole IF branch into the first DELETE statement above.

More information about the function used in the manual here.

Upvotes: 3

Related Questions