Reputation: 1898

SELECT COUNT(*) vs. fetching twice with an explicit cursor

I have read a book whose title is "Oracle PL SQL Programming" (2nd ed.) by Steven Feuerstein & Bill Pribyl. On page 99, there is a point suggested that

Do not "SELECT COUNT(*)" from a table unless you really need to know the total number of "hits." If you only need to know whether there is more than one match, simply fetch twice with an explicit cursor.

Could you anyone explain this point more to me by providing example? Thank you.

Update:

As Steven Feuerstein & Bill Pribyl recommends us not to use SELECT COUNT() to check whether records in a table exist or not, could anyone help me edit the code below in order to avoid using SELECT COUNT(*) by using explicit cursor instead? This code is written in the Oracle stored procedure.

I have a table emp(emp_id, emp_name, ...), so to check the provided employee ID corret or not:

CREATE OR REPLACE PROCEDURE do_sth ( emp_id_in IN emp.emp_id%TYPE )
IS
v_rows INTEGER;
BEGIN
    ...

    SELECT COUNT(*) INTO v_rows
    FROM emp
    WHERE emp_id = emp_id_in;

    IF v_rows > 0 THEN
        /* do sth */
    END;

    /* more statements */
    ...

END do_sth;

Upvotes: 11

Answers (8)

Tony Andrews

Reputation: 132680

There are a number of reasons why developers might perform select COUNT(*) from a table in a PL/SQL program:

1) They genuinely need to know how many rows there are in the table.

In this case there is no choice: select COUNT(*) and wait for the result. This will be pretty fast on many tables, but could take a while on a big table.

2) They just need to know whether a row exists or not.

This doesn't warrant counting all the rows in the table. A number of techniques are possible:

a) Explicit cursor method:

DECLARE
   CURSOR c IS SELECT '1' dummy FROM mytable WHERE ...;
   v VARCHAR2(1);
BEGIN
   OPEN c;
   FETCH c INTO v;
   IF c%FOUND THEN
      -- A row exists
      ...
   ELSE
      -- No row exists
      ...
   END IF;
END;

b) SELECT INTO method

DECLARE
   v VARCHAR2(1);
BEGIN
   SELECT '1' INTO v FROM mytable 
   WHERE ... 
   AND ROWNUM=1; -- Stop fetching if 1 found
   -- At least one row exists
EXCEPTION
   WHEN NO_DATA_FOUND THEN
      -- No row exists
END;

c) SELECT COUNT(*) with ROWNUM method

DECLARE
   cnt INTEGER;
BEGIN
   SELECT COUNT(*) INTO cnt FROM mytable 
   WHERE ... 
   AND ROWNUM=1; -- Stop counting if 1 found
   IF cnt = 0 THEN
      -- No row found
   ELSE
      -- Row found
   END IF;
END;

3) They need to know whether more than 1 row exists.

Variations on the techniques for (2) work:

a) Explicit cursor method:

DECLARE
   CURSOR c IS SELECT '1' dummy FROM mytable WHERE ...;
   v VARCHAR2(1);
BEGIN
   OPEN c;
   FETCH c INTO v;
   FETCH c INTO v;
   IF c%FOUND THEN
      -- 2 or more rows exists
      ...
   ELSE
      -- 1 or 0 rows exist
      ...
   END IF;
END;

b) SELECT INTO method

DECLARE
   v VARCHAR2(1);
BEGIN
   SELECT '1' INTO v FROM mytable 
   WHERE ... ;
   -- Exactly 1 row exists
EXCEPTION
   WHEN NO_DATA_FOUND THEN
      -- No row exists
   WHEN TOO_MANY_ROWS THEN
      -- More than 1 row exists
END;

c) SELECT COUNT(*) with ROWNUM method

DECLARE
   cnt INTEGER;
BEGIN
   SELECT COUNT(*) INTO cnt FROM mytable 
   WHERE ... 
   AND ROWNUM <= 2; -- Stop counting if 2 found
   IF cnt = 0 THEN
      -- No row found
   IF cnt = 1 THEN
      -- 1 row found
   ELSE
      -- More than 1 row found
   END IF;
END;

Which method you use is largely a matter of preference (and some religious zealotry!) Steven Feuerstein has always favoured explicit cursors over implicit (SELECT INTO and cursor FOR loops); Tom Kyte favours implicit cursors (and I agree with him).

The important point is that to select COUNT(*) without restricting the ROWCOUNT is expensive and should therefore only be done when a count is trully needed.

As for your supplementary question about how to re-write this with an explicit cursor:

CREATE OR REPLACE PROCEDURE do_sth ( emp_id_in IN emp.emp_id%TYPE )
IS
v_rows INTEGER;
BEGIN
    ...

    SELECT COUNT(*) INTO v_rows
    FROM emp
    WHERE emp_id = emp_id_in;

    IF v_rows > 0 THEN
        /* do sth */
    END;

    /* more statements */
    ...

END do_sth;

That would be:

CREATE OR REPLACE PROCEDURE do_sth ( emp_id_in IN emp.emp_id%TYPE )
IS
    CURSOR c IS SELECT 1
                FROM emp
                WHERE emp_id = emp_id_in;
    v_dummy INTEGER;
BEGIN
    ...

    OPEN c;    
    FETCH c INTO v_dummy;
    IF c%FOUND > 0 THEN
        /* do sth */
    END;
    CLOSE c;

    /* more statements */
    ...

END do_sth;

But really, in your example it is no better or worse, since you are selecting the primary key and Oracle is clever enough to know that it only needs to fetch once.

Upvotes: 22

Erich Kitzmueller

Reputation: 36987

Before you take Steven Feuerstein's suggestions too serious, just do a little benchmark. Is count(*) noticeably slower than the explicit cursor in your case? No? Then better use the construct that allows for simple, readable code. Which, in most cases, would be "select count(*) into v_cnt ... if v_cnt>0 then ..."

PL/SQL allows for very readable programs. Don't waste that just to nano-optimize.

Upvotes: 1

Samnang

Reputation: 5616

If you want to get number of rows in a table, please don't used count(*), I would suggest count(0) that 0 is the column index of your primary key column.

Upvotes: -2

Cade Roux

Reputation: 89721

He means open a cursor and fetch not only the first record but the second, and then you will know there is more than one.

Since I never seem to need to know that SELECT COUNT(*) is >= 2, I have no idea why this is a useful idiom in any SQL variant. Either no records or at least one, sure, but not two or more. And anyway, there's always EXISTS.

That, and the fact that Oracle's optimizer seems to be pretty poor... - I would question the relevance of the technique.

To address TheSoftwareJedi's comments:

WITH CustomersWith2OrMoreOrders AS (
    SELECT CustomerID
    FROM Orders
    GROUP BY CustomerID
    HAVING COUNT(*) >= 2
)
SELECT Customer.*
FROM Customer
INNER JOIN CustomersWith2OrMoreOrders
    ON Customer.CustomerID = CustomersWith2OrMoreOrders.CustomerID

Appropriately indexed, I've never had performance problems even with whole universe queries like this in SQL Server. However, I have consistently run into comments about Oracle optimizer problems here and on other sites.

My own experience with Oracle has not been good.

The comment from the OP appears to be saying that full COUNT(*) from tables are not well handled by the optimizer. i.e.:

IF EXISTS (SELECT COUNT(*) FROM table_name HAVING COUNT(*) >= 2)
BEGIN
END

(which, when a primary key exists, can be reduced to a simple index scan - in a case of extreme optimization, one can simply query the index metadata in sysindexes.rowcnt - to find the number of entries - all without a cursor) is to be generally avoided in favor of:

DECLARE CURSOR c IS SELECT something FROM table_name;
BEGIN
    OPEN c
    FETCH c INTO etc. x 2 and count rows and handle exceptions
END;

IF rc >= 2 THEN BEGIN
END

That, to me would result in less readable, less portable, and less maintainable code.

Upvotes: 1

yfeldblum

Reputation: 65445

SQL Server:

if 2 = (
    select count(*) from (
        select top 2 * from (
            select T = 1 union
            select T = 2 union
            select T = 3 ) t) t)
    print 'At least two'

Also, don't ever use cursors. If you think you really really need them, beat yourself with a shovel until you change your mind. Let relics from an ancient past remain relics from an ancient past.

Upvotes: -1

EvilTeach

Reputation: 28882

If two is all you are interested in, try

SELECT 'THERE ARE AT LEAST TWO ROWS IN THE TABLE'
FROM DUAL
WHERE 2 =
(
    SELECT COUNT(*)
    FROM TABLE
    WHERE ROWNUM < 3
)

It will take less code than doing the manual cursor method, and it is likely to be faster.

The rownum trick means to stop fetching rows once it has two of them.

If you don't put some sort of limit on the count(*), it could take a long while to finish, depending on the number of rows you have. In that case, using a cursor loop, to read 2 rows from the table manually, would be faster.

Upvotes: 5

TheSoftwareJedi

Reputation: 35236

This comes from programmers writing code similar to the following (this is psuedo code!).

You want to check to see if the customer has more than one order:

if ((select count(*) from orders where customerid = :customerid) > 1)
{
    ....
}

That is a terribly inefficient way to do things. As Mark Brady would say, if you want to know if a jar contains pennies, would you count all the pennies in the jar, or just make sure there is 1 (or 2 in your example)?

This could be better written as:

if ((select 1 from (select 1 from orders where customerid = :customerid) where rownum = 2) == 1)
{
    ....
}

This prevents the "counting all of the coins" dilemma since Oracle will fetch 2 rows, then finish. The previous example would cause oracle to scan (an index or table) for ALL rows, then finish.

Upvotes: 3

Mike Kale

Reputation: 4133

Depending on the DB, there may be a sys table which stores an approximate count and can be queried in constant time. Useful if you want to know whether the table has 20 rows or 20,000 or 20,000,000.

Upvotes: 0

SELECT COUNT(*) vs. fetching twice with an explicit cursor

Update:

Answers (8)

1) They genuinely need to know how many rows there are in the table.

2) They just need to know whether a row exists or not.

a) Explicit cursor method:

b) SELECT INTO method

c) SELECT COUNT(*) with ROWNUM method

3) They need to know whether more than 1 row exists.

a) Explicit cursor method:

b) SELECT INTO method

c) SELECT COUNT(*) with ROWNUM method

Related Questions