janoliver
janoliver

Reputation: 7824

Serializing vector of objects to std::string for use with MPI

I am trying to communicate a std::vector<MyClass> with varying size via MPI. MyClass contains members that are vectors that may be uninitialized or vary in size. To do that, I wrote a serialize() und deserialize() function that reads and writes such a std::vector<MyClass> to a std::string, which I then communicate via MPI.

class MyClass {
    ...
    int some_int_member;
    std::vector<float> some_vector_member;
}

std::vector<MyClass> deserialize(const std::string &in) {
    std::istringstream iss(in);

    size_t total_size;
    iss.read(reinterpret_cast<char *>(&total_size), sizeof(total_size));

    std::vector<MyClass> out_vec;
    out_vec.resize(total_size);

    for(MyClass &d: out_vec) {
        size_t v_size;
        iss.read(reinterpret_cast<char *>(&d.some_int_member), sizeof(d.some_int_member));
        iss.read(reinterpret_cast<char *>(&v_size), sizeof(v_size));
        d.some_vector_member.resize(v_size);
        iss.read(reinterpret_cast<char *>(&d.some_vector_member[0]), v_size * sizeof(float));
    }

    return out_vec;
}


std::string serialize(std::vector<MyClass> &data) {
    std::ostringstream os;

    size_t total_size = data.size();
    os.write(reinterpret_cast<char *>(&total_size), sizeof(total_size));

    for(MyClass &d: data) {
        size_t v_size = d.some_vector_member.size();
        os.write(reinterpret_cast<char *>(&some_int_member), sizeof(some_int_member));
        os.write(reinterpret_cast<char *>(&v_size), sizeof(v_size));
        os.write(reinterpret_cast<char *>(&d.some_vector_member[0]), v_size * sizeof(float));
    }
    return os.str();
}

My implementation works in principle, but sometimes (not always!) MPI processes crash at positions I think are related to the serialization. The payload sent can be as big as hundrets of MB. I suspect that using std::string as a container is not a good choice. Are there some limitations using std::string as a container for char[] with huge binary data that I may be running into here?

(Note, that I don't want to use boost::mpi along with its serialization routines, neither do I want to pull in a huge library such as cereal into my project)

Upvotes: 0

Views: 1182

Answers (1)

Zulan
Zulan

Reputation: 22650

Generally, using std::string for binary data is fine although some people might prefer std::vector<char> - or std::vector<std::byte> in C++17 (see also, note C++11 strings guarantee contiguous data). There are two significant efficiency issues in your code:

  1. You always have three copies of the whole data. The original objects, the serialized string and the intermediate [io]stringstream.
  2. You cannot pre-allocate (reserve) data in ostringstream, which may lead to over-allocation and frequent reallocation.

Hence, you waste a significant amount of memory, which might contribute to bad_alloc. That said, it may be perfectly fine and you just have a memory leak somewhere. It's impossible to tell if this is a practical issue for you without knowing the cause of the bad_alloc and a performance analysis of your application.

Upvotes: 1

Related Questions