Sammy Pawar
Sammy Pawar

Reputation: 1281

Improve the performance

I have two sets of data coming from external source - purchase date of the customer and last email click/open date of the customer. This is stored in two tables PURCHASE_INTER AND ACTIVITY_INTER tables respectively. Purchase data is in multiple and I need to pick up last purchase date. But activity data is unique for each customer. The data is independent of each other and other set of data may not be present. We have written below query which combines two tables, groups them based on person_id which is id of the customer coming from external source and get the latest of the dates, join with our customer table to get the customer email, and again join with another table where this data will be ultimately stored for sake of knowing whether it is insert or update operation. Could you please suggest how I can improve the performance of this query. It is terribly slow and taking more than 10 hours. There are millions of records coming in PURCHASE_INTER AND ACTIVITY_INTER tables.

SELECT INTER.*, C.ID AS CUSTOMER_ID, C.EMAIL AS CUSTOMER_EMAIL, LSI.ID AS INTERACTION_ID, ROW_NUMBER() OVER (ORDER BY PERSON_ID ASC) AS RN FROM (
   SELECT PERSON_ID               AS PERSON_ID,
        MAX(LAST_CLICK_DATE)    AS LAST_CLICK_DATE,
        MAX(LAST_OPEN_DATE)     AS LAST_OPEN_DATE,
        MAX(LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
   FROM (
     SELECT ACT.PERSON_ID AS PERSON_ID,
          ACT.LAST_CLICK_DATE AS LAST_CLICK_DATE,
          ACT.LAST_OPEN_DATE AS LAST_OPEN_DATE,
          NULL AS LAST_PURCHASE_DATE
     FROM ACTIVITY_INTER ACT
     WHERE ACT.JOB_ID = 77318317
     UNION
     SELECT PUR.PERSON_ID AS PERSON_ID,
          NULL AS LAST_CLICK_DATE,
          NULL AS LAST_OPEN_DATE,
          PUR.LAST_PURCHASE_DATE AS LAST_PURCHASE_DATE
     FROM PURCHASE_INTER PUR
     WHERE PUR.JOB_ID = 77318317
   ) GROUP BY PERSON_ID
 ) INTER LEFT JOIN CUSTOMER C ON INTER.PERSON_ID = C.PERSON_ID
         LEFT JOIN INTERACTION LSI ON C.ID = LSI.CUSTOMER_ID;

Upvotes: 0

Views: 71

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1271151

Your query suggests the following indexes:

  • ACTIVITY_INTER(JOB_ID, PERSON_ID, LAST_CLICK_DATE, LAST_OPEN_DATE)
  • PURCHASE_INTER(JOB_ID, PERSON_ID, LAST_PURCHASE_DATE)
  • CUSTOMER(PERSON_ID)
  • INTERACTION(CUSTOMER_ID)

(For the first two indexes, the first column is more important than the other two, unless the number of matches is very large.)

Also, change the UNION to UNION ALL. UNION incurs overhead for removing duplicates -- and this is not possible (at least between the two subqueries) because each subquery returns different columns.

In addition, you might want to replace the first subquery with a full outer join:

SELECT COALESCE(a.PERSON_ID, p.PERSON_ID) as PERSON_ID,
       a.LAST_CLICK_DATE, a.LAST_OPEN_DATE,p.LAST_PURCHASE_DATE
FROM (SELECT ACT.PERSON_ID AS PERSON_ID,
             MAX(ACT.LAST_CLICK_DATE) AS LAST_CLICK_DATE,
             MAX(ACT.LAST_OPEN_DATE) AS LAST_OPEN_DATE
      FROM ACTIVITY_INTER ACT
      WHERE ACT.JOB_ID = 77318317
      GROUP BY ACT.PERSON_ID
     ) a FULL OUTER JOIN
     (SELECT PUR.PERSON_ID AS PERSON_ID,
             MAX(PUR.LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
      FROM PURCHASE_INTER PUR
      WHERE PUR.JOB_ID = 77318317
      GROUP BY PER.PERSON_ID
     ) p
     ON a.PERSON_ID = p.PERSON_ID

This gives Oracle more options for optimization, because the aggregation is done directly on the tables -- making indexes and better statistics available for the processing.

Upvotes: 5

Related Questions