Reputation: 23
I was trying to fetch the count(*)
from the table, which has almost 7 million records and it taking more than an hour for returning the result.
Also the table has 153 columns out of which index has been created for column 123, so tried to run the following query in parallel, but it didn't help.
select /*+ parallel (5) */ count(123) from <table_name>
Please suggest if there is alternative way.
When I ran desc
on the table in Toad, the index tab holds the value of no. of rows. Any idea how that value is getting updated there?
Upvotes: 2
Views: 22514
Reputation: 146239
select /*+ parallel (5) */
Seems like odd number for degree of parallelism. Well, obvious 5 is an odd number, and that is strange. The DoPs ought to be a power multiple of two (see below for more).
Anyway, do you have a reason for using parallel query? Do you have at least five spare processors? If not, there is a good chance the overhead of managing the PQ slaves is at least contributing to the poor performance.
Why should DOP = n*2? There is an established heuristic based on Queuing Theory that running more than two batch jobs simultaneously leads to degraded performance. Find out more. (I think queuing theory actually recommends a figure of 1.8, but as database jobs are often bound by I/O or disk we can usually get away with 2.)
I originally said "power of 2" but that's mainly because multi-core servers tend to have a number of CPUs which is a power of 2, but multiple of 2 is more accurate, because some boxes have 12 CPUs or some other number.
Now, if we have a 64 core box, a DOP of 5 or 37 is fine, because we have enough CPUs to run that many threads simultaneously. But if we have got a small quadcore box, only 2, 4 or 8 makes sense, because those are the only values which will ensure an even distribution of work across all four processors. Running five threads on a quadcore box means one CPU will be doing a lot more work than the other three; there is a possibility that it will take longer to finish, leaving the other three slaves waiting. So DOP=5
can actually lead to a greater elapsed time than DOP=4
.
DOP=n*2
is only a rule of thumb, and not set in stone. However, it is based on sound reasoning, and we should know why we're doing something different. Obviously, we should have conducted some experiments to confirm that we have chosen the right DOP (whatever value we settle on).
Upvotes: 0
Reputation: 29458
Counting the number of rows of large table takes long time. It's natural. Some DBMS stores the number of records, however, this kinds of DBMS limits concurrency. It should lock the entire table before DML operation on the table. (The entire table lock is necessary to update the count properly.)
The value in ALL_TABLES.NUM_ROWS
(or USER_TABLES.NUM_ROWS
) is just a statistical information generated by analyze table ...
or dbms_stats.gather_table_stats
procedure. It's not accurate, not real-time information.
If you don't need the exact number of rows, you can use the statistical information. However you shouldn't depend on it. It's used by Oracle optimizer, but shouldn't in application program.
I'm not sure why you have to count the number of rows of the table. If you need it in the batch program which is run infrequently, you can partition the table to increase the parallelism. If you need the count in online program, you should find a way not to use the count.
Upvotes: 2
Reputation: 52356
A few issues to mention:
Upvotes: 4