Young Hyun Yoo
Young Hyun Yoo

Reputation: 598

What is the fastest way to read files using c++

Read files means I will read every document (doc, docx, xls, xml, txt,...) on my hard disk.

Most of my files will be about 10KB ~ 1MB, I think.

I'll read the file and filter the text if there is any specific words.

So my guess is I should have thread pool and 1 thread on reading files and other threads doing the filtering.

I heard there's MMF, CreateFile/ReadFile or I/O completion port to read the each files.

What function should I use?

Upvotes: 3

Views: 3013

Answers (3)

sashoalm
sashoalm

Reputation: 79705

There is no "fastest" method for reading I/O. You can't get any faster than fread or equivalents. Using threads will not help you, because hard drive I/O will be the main bottleneck anyway.

When bulk reading all the files in your harddrive, your speed will ultimately depend on the speed of your harddrive. It is likely that 95% of the time will be spent waiting on I/O so multi-threading will at most improve speed by 5-6%, but will do nothing like make your program run twice as fast.

Upvotes: -1

James Kanze
James Kanze

Reputation: 154047

For pure IO speed, you might want to try CreateFileMapping and MapViewOfFile. I've not measured this under Windows, but using similar techniques under Linux can result in a significant speed up.

Upvotes: 1

Mats Petersson
Mats Petersson

Reputation: 129524

In my tests, memory mapping the file is the fastest way to load the content into memory, by a small margin.

The test I perfomed were on Linux, but since the method of loading a file into a memory mapped region is copying the data in a page at a time, into memory that is owned by the OS [memory mapped files backing memory is owned and handled completely by the OS, so the OS has the ability to "lock" that memory in place, etc, etc]. This is quicker than reading a piece of file into a kernel buffer and then copying that content into the buffer provided by the application, since it avoids one copy. However, for large files (or many small files), the main limiting factor is still "how quickly can the hard-disk deliver data" - which for my system is around 60MB/s. You can make it slower than what the system produces, but not faster.

Upvotes: 4

Related Questions