Reputation: 432
I have a problem. My DB is Oracle 9. Here is my SQL:
SELECT COUNT(distinct A.person_id)
INTO p_record_count
FROM A,
B,
F,
T
WHERE B.position_id = F.position_id **--Normal bussiness association**
AND T.field2 = A.org_id **--Normal bussiness association**
AND F.Organization_Id = A.org_id **--Normal bussiness association**
AND A.person_id = F.person_id **--Normal bussiness association**
AND F.primary_flag = 'Y' **--Normal bussiness association**
and sysdate between F.effective_start_date and
F.effective_end_date **--Normal bussiness association**
AND NOT EXISTS --very slow
(SELECT log.person_id
FROM cper.ehr_access_log log
WHERE log.person_id = A.person_id
AND log.access_page = 'login.do'
AND trunc(log.access_time) BETWEEN
to_date(p_startDate, 'yyyy-mm-dd') AND
to_date(p_endDate, 'yyyy-mm-dd'))
AND A.enable_flag = 'Y'; **--Normal bussiness association**
Requirement: Get people who didn't login.
The logic: Get all people I can get, then minus the people who login.
I have only a table with the login records.
Table cper.ehr_access_log has more than 10M records. It is a log table. This SQL takes about 30seconds.
Thank you~!I will try.Have a good weekend~ : )
Upvotes: 1
Views: 146
Reputation: 28752
Table cper.ehr_access_log has more than 10M records.It is a log table.This sql takes about 30seconds. Within 5s is possible?
I would say it is not possible without adding some indexes. But you might want to try the following suggestions
Use a query plan to identify the hotspot. That will likely be tablescan required for ehr_access_log
and you can probably use that argument to convince others that you need the index.
Try replacing subquery with outer join as others have suggested
You have joins on B
and T
but not predicates on them. Do you really need them in the query?
You might be able to save some time by avoiding trunc
calls (not sure). As an alternative, modify p_startDate
beginning of day and p_endDate
to end of day
Upvotes: 3
Reputation: 811
I think you can optimize you query by selecting only DISTINCT log.person_id records, it will decrease amount of records from your log.
You can also try to make an EXCLUDE from the first part:
SELECT COUNT(distinct A.person_id)
INTO p_record_count
FROM A,
B,
F,
T
LEFT OUTER JOIN cper.ehr_access_log log
ON log.person_id = A.person_id
AND log.access_page = 'login.do'
AND trunc(log.access_time) BETWEEN
to_date(p_startDate, 'yyyy-mm-dd') AND
to_date(p_endDate, 'yyyy-mm-dd')
WHERE B.position_id = F.position_id **--Normal bussiness association**
AND T.field2 = A.org_id **--Normal bussiness association**
AND F.Organization_Id = A.org_id **--Normal bussiness association**
AND A.person_id = F.person_id **--Normal bussiness association**
AND F.primary_flag = 'Y' **--Normal bussiness association**
and sysdate between F.effective_start_date and
F.effective_end_date **--Normal bussiness association**
AND log.person_id IS null
AND A.enable_flag = 'Y'; **--Normal bussiness association**
Upvotes: 1
Reputation: 18064
For a better performance, you must INDEX the table cper.ehr_access_log
Since it is a LOG table, make a monthly schedule to INDEX this table based on person_id
. This will reduce some amount of time in quering.
Upvotes: 1
Reputation: 35227
The 'not exists' part of your query is known as a correlated subquery. You can generally get better performance if you re-write as an OUTER join.
The pattern is
select a from b where not exists (select 1 from c where b.a = c.a)
becomes
select a from b left outer join c on b.a = c.a where c.a is null
Upvotes: 2