Reputation: 8047
I have a project whereby I'm reading huge volumes of data from an Oracle database from Java.
I have the feeling that the application we are writing is going to process the data far faster than it will be given to us using a single threaded SELECT query and so I've been trying to research faster ways of obtaining the data.
Does anyone have anything I could read that would help me with my plight?
Upvotes: 3
Views: 6564
Reputation: 35401
Firstly, 'huge data' to database people is [at least] gigabytes, in which case I suspect your problems are going to be reading those sort of volumes into your processes memory and aggregating them there. Why do you think a single-threaded select will be the bottleneck ?
If the bottleneck were getting the data from disk, then having multiple threads pulling data from the same disk wouldn't necessarily be faster and may even be slower. But if you could spread the data over separate disks, separate threads would be faster. If, using SSD, you don't think disks will be a contention point,we can look elsewhere.
If the bottleneck was network bandwidth, again multiple threads wouldn't fit any more data through the pipe any faster. You may even benefit from unloading the data to a flat file, compressing it and transferring that.
If the select is being sorted or comes from a hash-join, you may use memory more efficiently with a single thread. Multiple sessions would have to share the machine's memory.
If there is a CPU intensive processing then multiple threads may help. That could be as simple as having multiple connections from java, each getting a different slice of data (eg A-K and L-Z), but it would very much depend on the SELECT.
I agree with dpbradley that you should determine the bottleneck first. If you have the data and select, it should be simple enough to determine how long it takes (both on the local machine and through the network), and a trace would be a necessary starting point to really go into how it could be speeded up.
Upvotes: 3
Reputation: 67792
Oracle supports parallel DML. In particular this applies to SELECT queries. Ultimately the bottleneck will probably be the IO read speed. Either use faster disks or stripe the data accross many disks.
As APC noted in the comments Parallel Queries/DML is an Entreprise Edition feature and is not available in the Standard Edition.
Also, Parallel DML/Query is not the solution to all performance problems. Since more than one process will be used by the query it may improve throughput, but at the cost of concurrency. The purpose of parallelism is to use more resources to process the query faster. If the query is IO-bound or CPU-bound, there is no extra resources to use and adding parallelism will only make matter worse.
From the link above:
Parallel execution is not normally useful for:
- Environments in which the CPU, memory, or I/O resources are already heavily utilized. Parallel execution is designed to exploit additional available hardware resources; if no such resources are available, then parallel execution will not yield any benefits and indeed may be detrimental to performance.
Upvotes: 4
Reputation: 1251
Use the setFetchSize(int) method on the Statement or PreparedStatement before you open the query. You should experiment with different sizes. Try 75 as a starting point.
On a slightly different useage, people have said that the PL/SQL bulk fetch "sweet spot" is between 2000 and 3000 but I saw one benchmark that indicated that 75 was optimum.
A large fetch size will tend to reduce the number of round trips between client and server. But if it is too large the database has to have a big buffer and the networking software may have to break up the big message into a lot of packets.
Upvotes: 3
Reputation: 11925
You haven't given us a lot of information on why it will be necessary to bring "huge volumes of data" into the Java application instead of processing it on the database side. Although there can be exceptions, usually this is signal to re-think the design. As a general rule with Oracle it is most efficient to do as much work as you can with pure set operations (SQL), followed by procedural processing with the rdbms engine (PL/SQL) before bringing results back to the client application.
Upvotes: 4