Reputation: 1281
I have two sets of data coming from external source - purchase date of the customer and last email click/open date of the customer. This is stored in two tables PURCHASE_INTER AND ACTIVITY_INTER tables respectively. Purchase data is in multiple and I need to pick up last purchase date. But activity data is unique for each customer. The data is independent of each other and other set of data may not be present. We have written below query which combines two tables, groups them based on person_id which is id of the customer coming from external source and get the latest of the dates, join with our customer table to get the customer email, and again join with another table where this data will be ultimately stored for sake of knowing whether it is insert or update operation. Could you please suggest how I can improve the performance of this query. It is terribly slow and taking more than 10 hours. There are millions of records coming in PURCHASE_INTER AND ACTIVITY_INTER tables.
SELECT INTER.*, C.ID AS CUSTOMER_ID, C.EMAIL AS CUSTOMER_EMAIL, LSI.ID AS INTERACTION_ID, ROW_NUMBER() OVER (ORDER BY PERSON_ID ASC) AS RN FROM (
SELECT PERSON_ID AS PERSON_ID,
MAX(LAST_CLICK_DATE) AS LAST_CLICK_DATE,
MAX(LAST_OPEN_DATE) AS LAST_OPEN_DATE,
MAX(LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
FROM (
SELECT ACT.PERSON_ID AS PERSON_ID,
ACT.LAST_CLICK_DATE AS LAST_CLICK_DATE,
ACT.LAST_OPEN_DATE AS LAST_OPEN_DATE,
NULL AS LAST_PURCHASE_DATE
FROM ACTIVITY_INTER ACT
WHERE ACT.JOB_ID = 77318317
UNION
SELECT PUR.PERSON_ID AS PERSON_ID,
NULL AS LAST_CLICK_DATE,
NULL AS LAST_OPEN_DATE,
PUR.LAST_PURCHASE_DATE AS LAST_PURCHASE_DATE
FROM PURCHASE_INTER PUR
WHERE PUR.JOB_ID = 77318317
) GROUP BY PERSON_ID
) INTER LEFT JOIN CUSTOMER C ON INTER.PERSON_ID = C.PERSON_ID
LEFT JOIN INTERACTION LSI ON C.ID = LSI.CUSTOMER_ID;
Upvotes: 0
Views: 71
Reputation: 1271151
Your query suggests the following indexes:
ACTIVITY_INTER(JOB_ID, PERSON_ID, LAST_CLICK_DATE, LAST_OPEN_DATE)
PURCHASE_INTER(JOB_ID, PERSON_ID, LAST_PURCHASE_DATE)
CUSTOMER(PERSON_ID)
INTERACTION(CUSTOMER_ID)
(For the first two indexes, the first column is more important than the other two, unless the number of matches is very large.)
Also, change the UNION
to UNION ALL
. UNION
incurs overhead for removing duplicates -- and this is not possible (at least between the two subqueries) because each subquery returns different columns.
In addition, you might want to replace the first subquery with a full outer join
:
SELECT COALESCE(a.PERSON_ID, p.PERSON_ID) as PERSON_ID,
a.LAST_CLICK_DATE, a.LAST_OPEN_DATE,p.LAST_PURCHASE_DATE
FROM (SELECT ACT.PERSON_ID AS PERSON_ID,
MAX(ACT.LAST_CLICK_DATE) AS LAST_CLICK_DATE,
MAX(ACT.LAST_OPEN_DATE) AS LAST_OPEN_DATE
FROM ACTIVITY_INTER ACT
WHERE ACT.JOB_ID = 77318317
GROUP BY ACT.PERSON_ID
) a FULL OUTER JOIN
(SELECT PUR.PERSON_ID AS PERSON_ID,
MAX(PUR.LAST_PURCHASE_DATE) AS LAST_PURCHASE_DATE
FROM PURCHASE_INTER PUR
WHERE PUR.JOB_ID = 77318317
GROUP BY PER.PERSON_ID
) p
ON a.PERSON_ID = p.PERSON_ID
This gives Oracle more options for optimization, because the aggregation is done directly on the tables -- making indexes and better statistics available for the processing.
Upvotes: 5