Developer
Developer

Reputation:

Reading Multiple Files in Multiple Threads using C#, Slow !

I have an Intel Core 2 Duo CPU and i was reading 3 files from my C: drive and showing some matching values from the files onto a EditBox on Screen.The whole process takes 2 minutes.Then I thought of processing each file in a separate thread and then the whole process is taking 2.30 minutes !!! i.e 30 seconds more than single threaded processing.

I was expecting the other way around !I can see both the Graphs in CPU usage history.Some one please explain to me what is going on ? here is my code snippet.

 foreach (FileInfo file in FileList)
{

   Thread t  = new Thread(new ParameterizedThreadStart(ProcessFileData));
   t.Start(file.FullName);  

}

where processFileData is the method that process the files.

Thanks!

Upvotes: 8

Views: 7560

Answers (4)

Michael La Voie
Michael La Voie

Reputation: 27926

The root of the problem is that the files are on the same drive and, unlike your dual core processor, your hard drive can only do one thing at a time.

If you read two files simultaneously, the disk heads will jump from one file to the other and back again. Given that your hard drive can read each file in roughly 40 seconds, it now has the additional overhead of moving its disk head between the three separate files many times during the read.

The fastest way to read multiple files from a single hard drive is to do it all in one thread and read them one after another. This way, the head only moves once per file read (at the very beginning) and not multiple times per read.

To optimize this process, you'll either need to change your logic (do you really need to read the whole contents of all three files?). Or purchase a faster hard drive/put the 3 files in three different hard drives and use threading/use a raid.

Upvotes: 13

RickNZ
RickNZ

Reputation: 18654

If you read from disk using multiple threads, then the disk heads will bounce around from one part of the disk to another as each thread reads from a different part of the drive. That can reduce throughput significantly, as you've seen.

For that reason, it's actually often a better idea to have all disk accesses go through a single thread, to help minimize disk seeks.

If your task is I/O bound and if it needs to run often, you might look at a tool like "contig" to make sure the layout of your files on disk is optimized / contiguous.

Upvotes: 3

No Refunds No Returns
No Refunds No Returns

Reputation: 8336

Since your process is IO bound, you should let the OS do your threading for you. Look at FileStream.BeginRead() for an example how to queue up your reads. Your EndRead() method can spin up your next request to read your next block of data pointing to itself to handle each subsequent completed block.

Also, with you creating additional threads, the OS has to manage more threads. And if a different CPU happens to get picked to handle the completed read, you've lost all of the CPU caching where your thread originated.

As you've found, you can't "speed up" an application just by adding threads.

Upvotes: 0

RageZ
RageZ

Reputation: 27313

If you processing is mostly IO bound and CPU bound it make sense it take same time or even more.

How do you compare those files ? You should think what is the bottleneck of you application? IO output/input, CPU, memory ...

The multithreading is only interesting for CPU bound processing. i.e. complex calculation, comparison of data in memory, sorting etc ...

Upvotes: 1

Related Questions