DPM
DPM

Reputation: 1660

can reading from disk from difference threads optimize program?

I am wondering is there a way to optimize reading from disk in java. I mean for example I want to print the contains of all text files in some directory, but after uppercase them. I can create another thread do uppercase them, but can I optimize reading by adding another(thread(s)) to read files too? I mean 2,3 or more threads to read difference files from disk. Is there some optimization for doing this or not? I hope that I explain the problem clearly.

Upvotes: 0

Views: 96

Answers (2)

Andrew Henle
Andrew Henle

Reputation: 1

I can create another thread do uppercase them

That's actually going in the right direction, but simply making all letters uppercase doesn't take enough time to really matter unless you're processing really large chunks of the file.

Because the standard single-threaded model of read-then-process means you're either reading data or processing it, when you could be doing both at the same time.

For example, you could be creating a series of highly compressed (say, JPEG2000 because it's so CPU intensive) images from a large video stream file. You could have one thread reading frames from the stream, placing them into a queue to process, and then have N threads each processing a frame into an image.

You'd tune the number of threads reading data and the number of threads processing data to keep both your disks and CPUs maximally busy without excess contention.

There are some cases where you can use multiple threads to read from a single file to get better performance. But you need a system designed from the ground up to do that. You need lots of disks (less so if they're SSDs), a pretty substantial IO infrastructure along with a system that has a lot of IO bandwidth, and then you need a file system that can handle multiple simultaneous access to a single file. Then the code you have to write to get better performance from reading using more than one thread has to match things like the physical layout of your files on disk.

That works best if you're doing lots of random reads from a file spread over multiple devices. Like a large, high-powered database server.

For example, lets say I have a huge data file spread over four or five disks (or even RAID arrays), with the file spread out over the disks in 64KB chunks. A handful of threads doing 64KB reads would be ideal to read or write such a file in a random-access mode. Let's say everything is really fast and you can read or write 1 GB/sec from such a file.

But if you turn around and just try to copy that data in a stream, you can still use multiple threads to get maximum performance - say 1 GB/sec - but if you just used a single thread to do read() calls in 1 MB chunks you'd probably get 950 MB/sec - or 95% or maximum multithreaded read performance.

I've actually benchmarked such systems and most of the time, multithreaded IO isn't worth the trouble unless you've invested a lot of money in your hardware and software (opensource file systems tend not to do this very well - you need to get into the realm of IBM's GPFS and Oracle's (nee LSC's then Sun's) QFS) and you know exactly what you're doing when you set it up.

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533520

I want to print the contains of all text files

This is most likely your bottleneck. If not, you should focus on what you bottleneck is as optimising anything else is likely to complicate your code for no benefit.

I can create another thread do uppercase them,

You can, though passing the work to another thread could be more expensive than making it uppercase depending on how your do this.

can I optimize reading by adding another(thread(s)) to read files too?

Possibly. How many disks do you have. If you have one disk, it can usually only do one thing at a time.

I mean 2,3 or more threads to read difference files from disk.

Most desktop drives can only do one operation at a time.

Is there some optimization for doing this or not?

Yes, but as I said, until you know what your bottleneck is, it's hard to jump to a solution.

Upvotes: 2

Related Questions