user2912705
user2912705

Reputation: 1

Working with huge arrays in C++

I'm currently picking up the code of a previous student (written in Java and changing it into C++ with which I am more familiar) and am looking for things to improve on.

The basic problem is that we are simulating a large number of random trajectories, storing the results in arrays. In his current code, there are 3000 trajectories each with 20000 timesteps, and so he has used 300 x 20000 arrays to store the positions, velocities (and a number of other system properties). The arrays are generated from other values in other arrays (for example temperature[0][j] depends upon position[0][j]. I know the code is always going to take a while to run, but I'm not sure if this is the most efficient way of going about it.

Upvotes: 0

Views: 377

Answers (3)

If your concern is performance, the question is whether your caches like the data layout. Large arrays that you step through line by line are usually fine (the data is loaded once into the cache, possibly by prefetching, worked upon, and written back / evicted from the cache). The only thing that may be a problem is inefficient prefetching if you use more arrays than your CPU can recognize.

Putting correlated values into a structure and building a large array of these structures would also be fine iff all the data within that structure is used in each pass through the array. If you don't use all data in this memory layout, the processor will load unnecessary data from memory and slow things down.

So, better stay with that separate array approach.

Upvotes: 0

Doug T.
Doug T.

Reputation: 65599

For your sanity, I'd consider a large array of structs/classes for each entity. IE:

 struct Entity {
    int position_x;
    int position_y;
    int temperature;
 };

You can pack down the size by shrinking each field using bit fields and some compiler-specific attributes to specify the overall size of the struct.

Upvotes: 1

Alexander L. Belikoff
Alexander L. Belikoff

Reputation: 5711

It really depends on what you are trying to do. If you work with one path at a time (i.e. doing some kind of Monte Carlo), then the best way would be to generate a path and then discard it once you get the data along it. If not, then, assuming your path space doesn't fit in memory, I'd generate and save all paths in a reasonably efficient format for quick access, then mmap the file.

Upvotes: 1

Related Questions