user3395981
user3395981

Reputation: 41

How to increase speed of reading data on Windows using c++

I am reading block of data from volume snapshot using CreateFile/ReadFile and buffersize of 4096 bytes. The problem I am facing is ReadFile is too slow, I am able to read 68439 blocks i.e. 267 Mb in 45 seconds, How can I increase the speed? Below is a part of my code that I am using,

block_handle = CreateFile(block_file,GENERIC_READ,FILE_SHARE_READ,0,OPEN_EXISTING,FILE_FLAG_SEQUENTIAL_SCAN,NULL);
if(block_handle != INVALID_HANDLE_VALUE)
{
    DWORD pos = -1;
    for(ULONG i = 0; i < 68439; i++)
    {
        sectorno = (i*8);
        distance = sectorno * sectorsize;
        phyoff.QuadPart = distance;     
        if(pos != phyoff.u.LowPart)
        {
             pos=SetFilePointer(block_handle, phyoff.u.LowPart,&phyoff.u.HighPart,FILE_BEGIN);
             if (phyoff.u.LowPart == INVALID_SET_FILE_POINTER && GetLastError() != NO_ERROR)
             {
                 printf("SetFilePointer Error: %d\n", GetLastError());
                 phyoff.QuadPart = -1;
                 return;
             }
        }
        ret = ReadFile(block_handle, data, 4096, &dwRead, 0);
        if(ret == FALSE)
        {
            printf("Error Read");
            return;
        }
        pos += 4096;
    }
}

Should I use OVERLAPPED structure? or what can be the possible solution? Note: The code is not threaded.

Awaiting a positive response.

Upvotes: 0

Views: 2230

Answers (3)

Adrian McCarthy
Adrian McCarthy

Reputation: 47954

  1. If possible, read sequentially (and tell CreateFile you intend to read sequentially with FILE_FLAG_SEQUENTIAL_SCAN).
  2. Avoid unnecessary seeks. If you're reading sequentially, you shouldn't need any seeks.
  3. Read larger chunks (like an integer multiple of the typical cluster size). I believe Windows's own file copy uses reads on the order of 8 MB rather than 4 KB. Consider using an integer multiple of the system's allocation granularity (available from GetSystemInfo).
  4. Read from aligned offsets (you seem to be doing this).
  5. Read to a page-aligned buffer. Consider using VirtualAlloc to allocate the buffer.
  6. Be aware that fragmentation of the file can cause expensive seeking. There's not much you can do about this.
  7. Be aware that volume compression can make seeks especially expensive because it may have to decompress the file from the beginning to find the starting point in the middle of the file.
  8. Be aware that volume encryption might slow things down. Not much you can do but be aware.
  9. Be aware that other software, like anti-malware, may be scanning the entire file every time you touch it. Fewer operations will minimize this hit.

Upvotes: 1

ravenspoint
ravenspoint

Reputation: 20457

Your problem is the fragmented data reads. You cannot solve this by fiddling with ReadFile parameters. You need to defragment your reads. here are three approaches:

  1. Defragment the data on the disk

  2. Defragment the reads. That is, collect all the reads you need, but do not read anything yet. Sort the reads into order. Read everything in order, skipping the SetFilePointer wherever possible ( i.e. sequential blocks ). This will speed the total read greatly, but introduce a lag before the first read starts.

  3. Memory map the data. Copy ALL the data into memory and do random access reads from memory. Whether or not this is possible depends on how much data there is in total.

Also, you might want to get fancy, and experiment with caching. When you read a block of data, it might be that although the next read is not sequential, it may well have a high probability of being close by. So when you read a block, sequentially read an enormous block of nearby data into memory. Before the next read, check if the new read is already in memory - thus saving a seek and a disk access. Testing, debugging and tuning this is a lot of work, so I do not really recommend it unless this is a mission critical optimization. Also note that your OS and/or your disk hardware may already be doing something along these lines, so be prepared to see no improvement whatsoever.

Upvotes: 1

CaptainCodeman
CaptainCodeman

Reputation: 2201

I'm not quite sure why you're using these extremely low level system functions for this.

Personally I have used C-style file operations (using fopen and fread) as well as C++-style operations (using fstream and read, see this link), to read raw binary files. From a local disk the read speed is on the order of 100MB/second.

In your case, if you don't want to use the standard C or C++ file operations, my guess is that the reason your code is slower is due to the fact that you're performing a seek after each block. Do you really need to call SetFilePointer for every block? If the blocks are sequential you shouldn't need to do this.

Also, experiment with different block sizes, don't be afraid to go up and beyond 1MB.

Upvotes: 1

Related Questions