Reputation: 517
I want my code to process a file very fast. This file size will vary from single KB to even 2 GB.
Even i am ready to create a separate file system for that single file.
I will split the file as constant size blocks(probably 8KB) and access it for data read and write. The code wise, the algorithm cannot be changed because it gives good performance and also stable one. so I don't want to change. I am also using mmap() to map blocks to memory on demand basis.
Is it possible to get a file system as a single block so that the the file access, read write operations can be faster?
Please give all your suggestions even a small thing that will help me.
The suggestions can be across platforms and file systems.
Thanks, Naga
Upvotes: 2
Views: 2108
Reputation: 1378
General, OS independent general rules:
Use physical reads (rather than streams)
Use large I/O buffers for your reads. The initialization of the I/O operation (and the sync with the spinning hardware) is time costly. Several small reads take longer than a large one.
Create some benchmark to figure out the most efficient buffer size. After a give size, efficiency will not improve, and you don't want to gobble all your precious RAM needlessly. The optimal buffer size depends on your hardware and OS. On current hardware, using buffer sizes in the 500KB to 1MB range is usually efficient enough.
Minimize disk head seeks. I.e. if you have to write the data back, the read/write alternance can be very costly, if they are to the same physical disk.
if you have some significant processing to do, use double buffering and asynchronous IO to overlap IO and processing.
Upvotes: 1
Reputation: 86818
Always try to access your file sequentially, in blocks of 64kB-1MB. That way you can take advantage of prefetching and maximize the amount of data per I/O operation.
Also, try to make sure that the file is contiguous in the first place so that the disk head doesn't have to move a lot between sequential reads. Many filesystems will create a file as contiguous as possible if you start out by setting the end of file or doing a write()
of the whole file at once. On Windows you can use the sysinternals.com utility contig.exe
to make a file contiguous.
Upvotes: 0
Reputation: 34218
Windows permits you to open a partition for raw reads and writes. It will also let you open a physical device for raw IO. So if you are willing to treat a Hard Disk or a Partition as a single file, you will be guaranteed that the 'file' logically contiguous on disk. (Because of the way hard-disks do hotfixes for bad sectors, it may not actually be physically contiguous).
If you choose to do raw io, then you will have to read and write in multiples of the block size of the device. This is usually 512 bytes, but it would probably be wiser to use 4k as your block size since that is what newer disks are using and that is the page size for Win32.
To open a partition for raw reads you use CreateFile with the filename "\.\X:" where X: is the drive letter of the partition. See the CreateFile documentation under the section heading Physical Disks and Volumes
On the other hand, it's pretty hard to beat the performance of memory mapped files, see this question for an example How to scan through really huge files on disk?
Upvotes: 0
Reputation: 204994
mmap
or MapViewOfFile
let you access files directly in memory. The OS will transparently fault in pages as needed, or possibly even read ahead (which can be hinted at with madvise
or FILE_FLAG_*
). Depending on your access pattern and the size of your files, this could be noticeably faster than reading/writing the files normally.
On the downside, you will have to worry a bit more about consistency (make sure to use msync
or FlushViewOfFile
with care), and because of the pagetable manipulations necessary, it might be slower too.
Upvotes: 0