DevG
DevG

Reputation: 494

Better approach to write view

Below is the query which I find is very slow .

Basically I am trying to get latest status of each workflow_id along with other information by joining these three tables .

Query logic goes like this :

  1. Group by table batch step execution on job execution id and retrieving latest step using max(step_execution_id) and then retrieving all columns using step execution id
  2. Group by table batch_job_execution on job_instance_id and retrieve latest execution .
  3. Inner join 1 and 2  join based on latest execution and job execution of latest step
  4. Joining with id mapping table

This is my current code:

SELECT workflow_id,
       collabration_key,
       TAB3.START_TIME                 AS WORKFLOWDATE,
       batch_step_execution.STEP_NAME  AS CURRENT_STEP_NAME ,
       batch_step_execution.EXIT_CODE  AS CURRENTSTEP ,
       batch_step_execution.start_time AS STEPTIME ,
       TAB3.EXIT_CODE                  AS JOB_STATUS
FROM   batch_step_execution
       INNER JOIN (
         SELECT *
         FROM   rpx_id_mapping
                INNER JOIN (
                  SELECT batch_job_execution.job_execution_id,
                         batch_job_execution.job_instance_id ,
                         batch_job_execution.START_TIME ,
                         batch_job_execution.EXIT_CODE
                  FROM   batch_job_execution
                  WHERE  batch_job_execution.job_execution_id IN (
                           SELECT MAX(job_execution_id)
                           FROM   batch_job_execution
                           WHERE  job_instance_id IN (
                                    SELECT job_id
                                    FROM   rpx_id_mapping
                                  )
                           GROUP BY  job_instance_id
                        )
                ) TAB2
                ON rpx_id_mapping.job_id = TAB2.job_instance_id
       ) TAB3
       ON batch_step_execution.job_execution_id = TAB3.job_execution_id
WHERE  batch_step_execution.step_execution_id = (
         SELECT MAX(step_execution_id)
         FROM   batch_step_execution
         WHERE  batch_step_execution.job_execution_id = TAB3.job_execution_id
       )
) TAB4

Here is the tables structure.

enter image description here

enter image description here

enter image description here

Is there a better approach to achieve the same ?

Upvotes: 0

Views: 69

Answers (1)

MT0
MT0

Reputation: 167867

  1. Group by table batch step execution on job execution id and retrieving latest step using max(step_execution_id) and then retrieving all columns using step execution id
  2. Group by table batch_job_execution on job_instance_id and retrieve latest execution .
  3. Inner join 1 and 2 join based on latest execution and job execution of latest step
  4. Joining with id mapping table

I've tried to follow your logic and think this is the same using an analytical query to get the maximum row:

SELECT *
FROM   (
    SELECT workflow_id,
           collabration_key,
           bse.START_TIME AS WORKFLOWDATE,
           bse.STEP_NAME   AS CURRENT_STEP_NAME ,
           bse.EXIT_CODE   AS CURRENTSTEP ,
           bse.start_time  AS STEPTIME ,
           bse.EXIT_CODE  AS JOB_STATUS,
           ROW_NUMBER() OVER ( PARTITION BY rim.job_execution_id,
                                            rim.job_id
                               ORDER BY     bse.step_execution_id DESC,
                                            bse.job_execution_id  DESC ) AS rn
    FROM   batch_step_execution bse
           INNER JOIN rpx_id_mapping rim
           ON (  bse.job_execution_id = rim.job_execution_id)
           INNER JOIN batch_job_execution bje
           ON ( rim.job_id = bje.job_instance_id )
)
WHERE rn = 1;

If it isn't then hopefully it gives you an idea of how to simplify things.

Upvotes: 2

Related Questions