刘伟科
刘伟科

Reputation: 432

How to make the 'not exist' faster

I have a problem. My DB is Oracle 9. Here is my SQL:

SELECT COUNT(distinct A.person_id)
        INTO p_record_count
        FROM A,
             B,
             F,
             T
       WHERE B.position_id = F.position_id **--Normal bussiness association**
         AND T.field2 = A.org_id           **--Normal bussiness association**
         AND F.Organization_Id = A.org_id  **--Normal bussiness association**
         AND A.person_id = F.person_id     **--Normal bussiness association**
         AND F.primary_flag = 'Y'          **--Normal bussiness association**
         and sysdate between F.effective_start_date and
             F.effective_end_date          **--Normal bussiness association**
         AND NOT EXISTS                    --very slow
                (SELECT log.person_id
                FROM cper.ehr_access_log log
               WHERE log.person_id = A.person_id
                 AND log.access_page = 'login.do'
                 AND trunc(log.access_time) BETWEEN
                     to_date(p_startDate, 'yyyy-mm-dd') AND
                     to_date(p_endDate, 'yyyy-mm-dd'))
         AND A.enable_flag = 'Y';            **--Normal bussiness association**

Requirement: Get people who didn't login.

The logic: Get all people I can get, then minus the people who login.

I have only a table with the login records.

Table cper.ehr_access_log has more than 10M records. It is a log table. This SQL takes about 30seconds.

Within 5s is possible? Thanks for your help.

Thank you~!I will try.Have a good weekend~ : )

Upvotes: 1

Views: 146

Answers (4)

Miserable Variable
Miserable Variable

Reputation: 28752

Table cper.ehr_access_log has more than 10M records.It is a log table.This sql takes about 30seconds. Within 5s is possible?

I would say it is not possible without adding some indexes. But you might want to try the following suggestions

  • Use a query plan to identify the hotspot. That will likely be tablescan required for ehr_access_log and you can probably use that argument to convince others that you need the index.

  • Try replacing subquery with outer join as others have suggested

  • You have joins on B and T but not predicates on them. Do you really need them in the query?

  • You might be able to save some time by avoiding trunc calls (not sure). As an alternative, modify p_startDate beginning of day and p_endDate to end of day

Upvotes: 3

nonsleepr
nonsleepr

Reputation: 811

I think you can optimize you query by selecting only DISTINCT log.person_id records, it will decrease amount of records from your log.

You can also try to make an EXCLUDE from the first part:

SELECT COUNT(distinct A.person_id)
    INTO p_record_count
    FROM A,
         B,
         F,
         T
   LEFT OUTER JOIN cper.ehr_access_log log
     ON log.person_id = A.person_id
        AND log.access_page = 'login.do'
        AND trunc(log.access_time) BETWEEN
        to_date(p_startDate, 'yyyy-mm-dd') AND
        to_date(p_endDate, 'yyyy-mm-dd')
   WHERE B.position_id = F.position_id **--Normal bussiness association**
     AND T.field2 = A.org_id           **--Normal bussiness association**
     AND F.Organization_Id = A.org_id  **--Normal bussiness association**
     AND A.person_id = F.person_id     **--Normal bussiness association**
     AND F.primary_flag = 'Y'          **--Normal bussiness association**
     and sysdate between F.effective_start_date and
         F.effective_end_date          **--Normal bussiness association**
     AND log.person_id IS null
     AND A.enable_flag = 'Y';            **--Normal bussiness association**

Upvotes: 1

Siva Charan
Siva Charan

Reputation: 18064

For a better performance, you must INDEX the table cper.ehr_access_log

Since it is a LOG table, make a monthly schedule to INDEX this table based on person_id. This will reduce some amount of time in quering.

Upvotes: 1

Ed Guiness
Ed Guiness

Reputation: 35227

The 'not exists' part of your query is known as a correlated subquery. You can generally get better performance if you re-write as an OUTER join.

The pattern is

select a from b where not exists (select 1 from c where b.a = c.a)

becomes

select a from b left outer join c on b.a = c.a where c.a is null

Upvotes: 2

Related Questions