Brabin
Brabin

Reputation: 23

Oracle count (*) is taking too much time

I was trying to fetch the count(*) from the table, which has almost 7 million records and it taking more than an hour for returning the result.

Also the table has 153 columns out of which index has been created for column 123, so tried to run the following query in parallel, but it didn't help.

select /*+ parallel (5) */ count(123) from <table_name>

Please suggest if there is alternative way.

When I ran desc on the table in Toad, the index tab holds the value of no. of rows. Any idea how that value is getting updated there?

Upvotes: 2

Views: 22514

Answers (3)

APC
APC

Reputation: 146239

select /*+ parallel (5) */ 

Seems like odd number for degree of parallelism. Well, obvious 5 is an odd number, and that is strange. The DoPs ought to be a power multiple of two (see below for more).

Anyway, do you have a reason for using parallel query? Do you have at least five spare processors? If not, there is a good chance the overhead of managing the PQ slaves is at least contributing to the poor performance.


Why should DOP = n*2? There is an established heuristic based on Queuing Theory that running more than two batch jobs simultaneously leads to degraded performance. Find out more. (I think queuing theory actually recommends a figure of 1.8, but as database jobs are often bound by I/O or disk we can usually get away with 2.)

I originally said "power of 2" but that's mainly because multi-core servers tend to have a number of CPUs which is a power of 2, but multiple of 2 is more accurate, because some boxes have 12 CPUs or some other number.

Now, if we have a 64 core box, a DOP of 5 or 37 is fine, because we have enough CPUs to run that many threads simultaneously. But if we have got a small quadcore box, only 2, 4 or 8 makes sense, because those are the only values which will ensure an even distribution of work across all four processors. Running five threads on a quadcore box means one CPU will be doing a lot more work than the other three; there is a possibility that it will take longer to finish, leaving the other three slaves waiting. So DOP=5 can actually lead to a greater elapsed time than DOP=4.

DOP=n*2 is only a rule of thumb, and not set in stone. However, it is based on sound reasoning, and we should know why we're doing something different. Obviously, we should have conducted some experiments to confirm that we have chosen the right DOP (whatever value we settle on).

Upvotes: 0

ntalbs
ntalbs

Reputation: 29458

Counting the number of rows of large table takes long time. It's natural. Some DBMS stores the number of records, however, this kinds of DBMS limits concurrency. It should lock the entire table before DML operation on the table. (The entire table lock is necessary to update the count properly.)

The value in ALL_TABLES.NUM_ROWS (or USER_TABLES.NUM_ROWS) is just a statistical information generated by analyze table ... or dbms_stats.gather_table_stats procedure. It's not accurate, not real-time information.

If you don't need the exact number of rows, you can use the statistical information. However you shouldn't depend on it. It's used by Oracle optimizer, but shouldn't in application program.

I'm not sure why you have to count the number of rows of the table. If you need it in the batch program which is run infrequently, you can partition the table to increase the parallelism. If you need the count in online program, you should find a way not to use the count.

Upvotes: 2

David Aldridge
David Aldridge

Reputation: 52356

A few issues to mention:

  1. For "select count(*) from table" to use an index, the indexed column must be non-nullable, or the index must be a bitmap type.
  2. If there are known to be no nulls in the column but there is no not null constraint on it, then use "select count(*) from table where column_name is not null".
  3. It does of course have to be more efficient to scan the index than the table, but with so many table columns you're probably fine there.
  4. If you really want a parallel index scan, use the parallel_index hint, not parallel. But with only 7 million rows you might not find any need for parallelism.
  5. You need to check the execution plan to see if an index and/or parallel query is in use.
  6. If you can use an estimated number of rows then consider using the sample clause: for example "select 1000*count(*) from table sample(0.1)"

Upvotes: 4

Related Questions