Misery
Misery

Reputation: 689

How to write a large binary file to a disk

I am writing a program which requires writing a large binary file (about 12 GiB or more) to a disk. I have created a small test program to test this functionality. Although allocating the RAM memory for the buffer is not a problem, my program does not write the data to a file. The file remains empty. Even for 3.72 GiB files.

    //size_t bufferSize=1000; //ok
    //size_t bufferSize=100000000; //ok
    size_t bufferSize=500000000; //fails although it is under 4GiB, which shouldn't cause problem anyways
    double mem=double(bufferSize)*double(sizeof(double))/std::pow(1024.,3.);
    cout<<"Total memory used: "<<mem<<" GiB"<<endl;

    double *buffer=new double[bufferSize];
/* //enable if you want to fill the buffer with random data
    printf("\r[%i \%]",0);

    for (size_t i=0;i<(size_t)bufferSize;i++)
    {
        if ((i+1)%100==0) printf("\r[%i %]",(size_t)(100.*double(i+1)/bufferSize));
        buffer[i]=rand() % 100;
    }
*/
    cout<<endl;

     std::ofstream outfile ("largeStuff.bin",std::ofstream::binary);
     outfile.write ((char*)buffer,((size_t)(bufferSize*double(sizeof(double)))));

     outfile.close();

    delete[] buffer;

Upvotes: 0

Views: 786

Answers (2)

It seems that you want to have a buffer that contains the whole file's contents prior to writing it.

You're doing it wrong, through: the virtual memory requirements are essentially double of what they need to be. Your process retains the buffer, but when you write that buffer to disk it gets duplicated in operating system's buffers. Now, most OSes will notice that you write sequentially and may discard their buffers quickly, but still: it's rather wasteful.

Instead, you should create an empty file, grow it to its desired size, then map its view into memory, and do the modifications on the file's view in memory. For 32 bit hosts your file size is limited to <1GB. For 64 bit hosts, it's limited by the filesystem only. On modern hardware, creating and filling a 1GB file that way takes on the order of 1 second (!) if you have enough free RAM available.

Thanks to the wonders of RAII, you don't need to do anything special to release the mapped memory, or to close/finalize the file. By leveraging boost you can avoid writing platform-specific code, too.

// https://github.com/KubaO/stackoverflown/tree/master/questions/mmap-boost-40308164
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <boost/filesystem.hpp>
#include <cassert>
#include <cstdint>
#include <fstream>

namespace bip = boost::interprocess;

void fill(const char * fileName, size_t size) {
    using element_type = uint64_t;
    assert(size % sizeof(element_type) == 0);
    std::ofstream().open(fileName); // create an empty file
    boost::filesystem::resize_file(fileName, size);
    auto mapping = bip::file_mapping{fileName, bip::read_write};
    auto mapped_rgn = bip::mapped_region{mapping, bip::read_write};
    const auto mmaped_data = static_cast<element_type*>(mapped_rgn.get_address());
    const auto mmap_bytes = mapped_rgn.get_size();
    const auto mmap_size = mmap_bytes / sizeof(*mmaped_data);
    assert(mmap_bytes == size);

    element_type n = 0;
    for (auto p = mmaped_data; p < mmaped_data+mmap_size; ++p)
       *p = n++;
}

int main() {
   const uint64_t G = 1024ULL*1024ULL*1024ULL;
   fill("tmp.bin", 1*G);
}

Upvotes: 2

silverscania
silverscania

Reputation: 707

I actually compiled and ran the code exactly as you have pasted there and it works. It creates a 4GB file.

If you are on a FAT32 filesystem the max filesize is 4GB.

Otherwise I suggest you check:

  • The amount of free disk space you have.
  • Whether your user account has any disk usage limits in place.
  • The amount of free RAM you have.
  • Whether there are any runtime errors.
  • @enhzflep's suggestion about the number of prints (although that is commented out)

Upvotes: 2

Related Questions