Zain Rizvi
Zain Rizvi

Reputation: 24636

Reading binary files without buffering the whole file into memory in C++

In order to make a binary comparer I'm trying to read in the binary contents of two files using the CreateFileW function. However, that causes the whole file to be bufferred into memory, and that becomes a problem for large (500MB) files.

I've looked around for other functions that'll let me just buffer part of the file instead, but I haven't found any documentation specifically stating how the buffer works for those functions (I'm a bit new at this so maybe I'm missing the obvious).

So far the best match I seem to have found is ReadFile. It seems to have a definable buffer but I'm not completely sure that there won't be another buffer implemented behind the scenes, like there is with CreateFileW.

Do you guys have any input on what would be a good function to use?

Upvotes: 2

Views: 4505

Answers (4)

Don Dickinson
Don Dickinson

Reputation: 6248

You could use memory mapped files to do this. open with createFile, use createFileMapping then MapViewOfFile to get a pointer to the data.

Upvotes: 7

Matthew Xavier
Matthew Xavier

Reputation: 2128

Calling CreateFile() does not itself buffer or otherwise read the contents of the target file. After calling CreateFile(), you must call ReadFile() to obtain whatever parts of the file you want, for example to read the first kilobyte of a file:

DWORD cbRead;
BYTE buffer[1024];
HANDLE hFile = ::CreateFile(filename,
                            GENERIC_READ,
                            FILE_SHARE_READ,
                            NULL,
                            OPEN_EXISTING,
                            FILE_ATTRIBUTE_NORMAL,
                            NULL);
::ReadFile(hFile, sizeof(buffer), &cbRead, NULL);
::CloseHandle(hFile);

In addition, if you want to read a random portion of the file, you can use SetFilePointer() before calling ReadFile(), for example to read one kilobyte starting one megabyte into the file:

DWORD cbRead;
BYTE buffer[1024];
HANDLE hFile = ::CreateFile(filename,
                            GENERIC_READ,
                            FILE_SHARE_READ,
                            NULL,
                            OPEN_EXISTING,
                            FILE_ATTRIBUTE_NORMAL,
                            NULL);
::SetFilePointer(hFile, 1024 * 1024, NULL, FILE_BEGIN);
::ReadFile(hFile, sizeof(buffer), &cbRead, NULL);
::CloseHandle(hFile);

You may, of course, call SetFilePointer() and ReadFile() as many times as you wish while the file is open. A call to ReadFile() implicitly sets the file pointer to the byte immediately following the last byte read by ReadFile().

Additionally, you should read the documentation for the File Management Functions you use, and check the return values appropriately to trap any errors that might occur.

Windows may, at its discretion, use available system memory to cache the contents of open files, but data cached by this process will be discarded if the memory is needed by a running program (after all, the cached data can just be re-read from the disk if it is needed).

Upvotes: 5

Michael
Michael

Reputation: 55385

Not sure what you mean by CreateFile buffering - CreateFile won't read in the entire contents of the file, and besides, you need to call CreateFile before you can call ReadFile.

ReadFile will do what you want - the OS may do some read ahead of data to opportunisticly cache data, but it will not read the entire 500 MB of file in.

If you really want to have no buffering, pass FILE_FLAG_NO_BUFFERING to CreateFile, and ensure that your file accesses are a multiple of volume sector size. I strongly suggest you do not do this - the system file cache exists for a reason and helps with performance. Caching files in memory should have no effect on the overall system's memory usage - under memory pressure the system file cache will shrink.

As others have mentioned, you can use memory mapped files as well. The difference between memory mapped files and ReadFile is mainly just the interface - ultimately the file manager will satisfy the requests in a similar manner, including some buffering. The interface appears to be a bit more intuitive, but be aware that any errors that occur will result in an exception that will need to be caught otherwise it will crash your program.

Upvotes: 5

Drew Hoskins
Drew Hoskins

Reputation: 4186

I believe you want MapViewOfFile.

Upvotes: 1

Related Questions