davewy
davewy

Reputation: 2031

MPI Shared Memory for Complex Object

I have a large-scale code that runs on many CPU cores, potentially across a number of compute nodes. The code is in C++ and is parallelized with OpenMPI.

My code has a very large object (~10GB RAM usage) that is read from by each MPI process. This object is updated very occasionally (and can be done by a single process, just reading in a data file).

What I've been doing so far is giving each MPI process a copy of this object; but that means I'm severely RAM-limited and can't use the full CPU power of my nodes. So, I've been reading about shared memory in the MPI 3 specification.

My question is: what is the best way to share a complex object across MPI processes? In all the examples I find, MPI shared memory windows are created and used to exchange simple data structures (floats, arrays of ints, etc.). My global object is a custom class type that includes a number of member variables, some of which are pointers, and many of which are other complex class types. Hence, I feel like I won't be able to just call MPI_Win_allocate_shared and pass in the address of my complex object, especially since I want to share all the info about the member variables (in particular, I want to share the underlying values of the pointer type member variables - i.e. sharing a "deep copy" across MPI processes, with all virtual memory addresses correct in each process).

Is it possible to achieve this "deep sharing" with MPI shared memory, and if so, is there a "best practice" for doing so? Or would another library (e.g. boost interprocess) make this more feasible/straightforward for me?

P.S. If I can't figure out a good solution, I will resort to a hybrid MPI+pthreads approach, where I know I can easily have this global object on each node with pthreads. But I'm really hoping to find an elegant MPI-only solution.

Upvotes: 2

Views: 2332

Answers (1)

Aleksander Stankiewicz
Aleksander Stankiewicz

Reputation: 550

If you cross machine boundary (and you use nodes on many machines) there is no any easy way to achieve your goal. If you use Windows or Linux machines only (not mixing them) you can try to hack it the way for instance attaching some shared resource to virtual memory (using system API to do it efficient way). Other way could be to create create custom serialization/deserialization code for your large object and than to store it in memory as binary array (to share it between processes on the same machine). The issue is big/little endian in case you try to store just "memory dump". In case you use dedicated MPI api all endian (and data presentation issues) are properly supported for sure. I'm not sure at the moment if PVM supports such scenario better, but in case of MPI I could start with direct usage of VM on the same machine (sharing some access key between processes only) ...

Additional answer 1:

On one machine it should be simple I think (you use Windows probably so I will focus on this platform at the moment). Endian issues and data alignment doesn't matter in such case, because I assume you compile all your stuff with the same options (and use on the same hardware). The easiest way to achieve your goal is to map to virtual memory a properly named file (name doesn't matter at the moment until you create many mappings for different objects - in such case you need some naming schema for consistency). Sample is here for instance.

After the creation of virtual memory place all the object data there (using old school memcpy or just so called placement constructors). When all the data are available already in virtual memory just send file name with a few additional attributes to all of the processes/nodes on the same machine. On the beginning of virtual memory space you can place some array with pointers to objects (with allocation address deltas for instance) for easy linkage of all the related objects if you have there more than one object (in such case the first on element in vm should contain number of elems in such array - it's just a some idea only). You can map your virtual memory to the same virtual address on every process so you don't have to manage pointers if you are not interested in it at all :) In such case no any array with pointers is necessary!

Additional plus of using virtual memory is it optimizes memory pages usage, so it will not swallow 10GB of ram in case you have such big data objects.

BTW: Windows supports direct memory pages sharing with some switch on sections. In CPP you have such support for it.

Upvotes: 1

Related Questions