Reputation: 23177
I wrote a script to read a 100mb+ text file using a single thread and multiple threads. The multi-threaded script shares the same StreamReader, and locks it during the StreamReader.ReadLine() call. After timing my two scripts, they are about the same speed (it seems that the ReadLine() is what's taking up most of the run-time).
Where can I take this next? I'm thinking of splitting the source file into multiple text files so each thread can work with its own StreamReader, but that seems a bit cumbersome. Is there a better way to speed up my process?
Thanks!
Upvotes: 2
Views: 1507
Reputation: 62439
With a single hard-disk, there's not much you can do except use a single producer (to read files) multiple consumer (for processing) model. A hard disk needs to move the mechanical "head" in order to seek the next reading position. Multiple threads doing this will just bounce the head around and not bring any speedup (worse, in some cases it may be slower).
Splitting the input file is even worse, because now the file chunks are no longer consecutive and need further seeking.
So use a single thread to read chunks of the large file and either put the tasks in a synchronized queue (e.g. ConcurrentQueue
) for multiple consumer threads or use QueueUserWorkItem
to access the built-in thread pool.
Upvotes: 5
Reputation: 5681
Where can you take this next?
Add multiple HDDs then have 1 thread per HDD. Split your file across the HDDs. Kinda like RAID.
EDIT: Similar questions have been asked many times here. Just use 1 threads to read file and 1 thread to process. No multithreading needed.
Upvotes: 0