Tiago.SR
Tiago.SR

Reputation: 369

Loading large amount of binary data into RAM

My application needs to load from MegaBytes to dozens of GigaBytes of binary data (multiple files) into RAM. After some search, I decided to use std::vector<unsigned char> for this purpose, although I am not sure it's the best choice.

I would use one vector for each file. As application previously knows file size, it would call reserve() to allocate memory for it. Sometimes the application might need to fully read a file and in some others only part of it and vector's iterators are nice for that. It may need to unload a file from RAM and put other in place, std::vector::swap() and std::vector::shrink_to_fit() would be very useful. I don't want to have the hard work of dealing with low level memory allocation stuff (otherwise would go with C).

I've some questions:

Obs.: This section of the application must be get the more performance (low processing time/CPU usage and RAM consumption) possible. I would appreciate your help.

Upvotes: 3

Views: 1066

Answers (2)

eerorika
eerorika

Reputation: 238351

How would it know if there is enough memory space to load one more file?

You wouldn't know before hand. Wrap the loading process in try - catch. If memory runs out, then a std::bad_alloc will be thrown (assuming you use default allocators). Assume that memory is sufficient in the loading code, and deal with the lack of memory in the exception handler.

But what about implementations limitation? ... Are there anything regarding to implementations that could prevent me from doing what I want to?

You can check std::vector::max_size at run time to verify.

If the program is compiled with a 64 bit word size, then it is quite likely that the vector has sufficient max_size for a few hundred gigabytes.


This section of the application must be get the more performance

This conflicts with

I don't want to have the hard work of dealing with low level memory allocation stuff

But in case low level memory stuff is worth it for the performance, you could memory-map the file into memory.


I've read on some SO questions to avoid them on applications that need high performance and prefer dealing with return values, errno, etc

Unfortunately for you, non-throwing memory allocation is not an option if you use the standard containers. If you are allergic to exceptions, then you must use another implementation of a vector - or whatever container you decide to use. You don't need any container with mmap, though.

Won't handling exceptions break performance?

Luckily for you, run time cost of exceptions is insignificant compared to reading hundreds of gigabytes from disk.

May it be better to run sysinfo() and work on checking free RAM before loading a file?

sysinfo call may very well be slower than handling an exception (I haven't measured, that is just a conjecture) - and it won't tell you about process specific limits that may exist.

And also, it looks hard and costly to repetitively try load a file, catch exception and try load a smaller file (requires recursion?)

No recursion needed. You can use it if you prefer; it can be written with tail call, that can be optimized away.


About memory mapping: I took a look on it sometime ago and found boring to deal with. Would require to use C's open() and all that stuff and say bye to std::fstream.

Once you have mapped the memory, it is easier to use than std::fstream. You can skip the copying into vector part, and simply use the mapped memory as if it was an array that already exists in memory.

Looks like best way of partially reading a file using std::fstream is to derive std::streambuf

I don't see why you would need to derive anything. Just use std::basic_fstream::seekg() to skip to the part that you wish to read.

Upvotes: 6

xtofl
xtofl

Reputation: 41509

As an addition to @user2097303's answer, I want to add that vector guarantees contiguous allocation. For long running applications, this will result in memory fragmentation, and in the end, no contiguous block of memory will be present anymore, although between blocks, plenty of space is free.

Therefore it may be a good idea to store your data into deque

Upvotes: 2

Related Questions