Reputation: 20620
I have 1GB binary file which basically contains 3D cube of same type of values. Saving this kind of cube with different order ([x,y,z] or [z x, y]) takes a lot of time with fseek and fwrite. But one of software packages does this a lot faster than my program. Is there any approach to make file writing faster than one with fseek/fwrite?
Upvotes: 5
Views: 3056
Reputation: 33
If you are doing a lot of random access writing. I suggest you use mmap. mmap maps memory pages to your file and it is controlled by the OS. Similar to memory swap mechanism.
Another way is you can use Asynchronous IO. It is provided by GLIBC http://www.gnu.org/software/libc/manual/html_node/Asynchronous-I_002fO.html
It simply puts the data in a queue in memory then creates another thread to manage IO.
Upvotes: 1
Reputation: 11499
You should not use fseek in the inner loop of file io operations. For the writing functions to be fast they cache the writes. If you seek all over the place you keep blowing the cache.
Do all your transformations in memory - e.g rotate the cube in memory, and then write the file in a few sequentual fwrite calls.
If you can't transform your data completely in memory, then assemble your cube one plane at a time in memory and write out each plane.
@edit:
In your case you don't want to use fseek at all. Not even one.
Do something like this:
void writeCubeZYX( int* cubeXYZ, int sizeOfCubeXYZ, FILE* file )
{
int* cubeZYX = malloc( sizeOfCubeXYZ );
// all that monkey business you're doing with fseek is done inside this
// function copying memory to memory. No file IO operations in here.
transformCubeXYZ_to_ZYX( cubeXYZ, cubeZYX, sizeOfCubeXYZ );
// one big fat very fast fwrite. Optimal use of file io cache.
fwrite( file, cubeZYX, 1, sizeOfCubeXYZ );
free( cubeZYX ); // quiet pedantry.
}
@edit2:
Ok suppose you can't transform it all in memory then transform it in planes and write out one plane at a time - in file order - that is with no fseeks.
So say an [XYZ] cube is laid out in memory as a series of Z [XY] matrices. That is the [XY] planes of your cube are contiguous in memory. And you want to write out as [ZYX]. So in the file you want to write out a series of X [ZY] matrices. Each [ZY] will be contiguous in the file.
So you do something like this:
void writeCubeZYX( int* cubeXYZ, int x, int y, int z, FILE* file )
{
int sizeOfPlaneZY = sizeof( int ) * y * z;
int* planeZY = malloc( sizeOfPlaneZY );
for ( int i = 0; i < X; i++ )
{
// all that monkey business you're doing with fseek is done inside this
// function extracting one ZY plane at a time. No file IO operations in here.
extractZYPlane_form_CubeXYZ( cubeXYZ, planeZY, i );
// in X big fat very fast fwrites. Near optimal use of file io cache.
fwrite( file, planeZY, 1, sizeOfPlaneZY );
}
free( planeZY ); // quiet pedantry.
}
Upvotes: 7
Reputation: 15944
If you don't mind having the file on disk be a compressed file, then it may be faster to compress it as you write it. This speeds things up because the bottleneck is usually writing bytes to disk, and by compressing as you write you reduce the number of bytes you need to write.
This will of course depend on whether your data is amenable to compression. One option for compressing output in c++ is gzip. E.g.: How do I read / write gzipped files?
But in your case, this may not be applicable -- It's unclear from your question exactly when/why you're fseeking. What is your expected pattern of writes?
Upvotes: 0