V. Delecroix
V. Delecroix

Reputation: 204

C/C++ allocation of arrays of array like objects

I am mostly a C programmer, and I am looking for a fast and elegant solution to do what I want in C++. Let us consider this simple data structure

struct mystruct
{
    int * array1;
    int * array2;
    size_t size;
};

The two pointers array1 and array2 are to be thought as two arrays of length size. I need a huge amount of these (about 2**30 or 1.000.000.000) all of the same small size (about 100). All of them will be deallocated at the exact same time. I can do the following in C with only one call to malloc where K is the number of struct I need and N is the size of the arrays

EDITED VERSION (see the old one below)

size_t NN = N * sizeof(int);
struct mystruct * my_objects = malloc(K * sizeof(struct mystruct));
int * memory = malloc(2*K*NN);
for(i=0; i<K; ++i)
{
    my_objects[i].size = N;
    my_objects[i].array1 = memory + 2*i*NN;
    my_objects[i].array2 = memory + (2*i+1)*NN;
}
...
free(my_objects);
free(memory);

This version does not support very huge K and does not allow me to resize the array. But it is not so hard to design something for that purpose. Is there a way of creating a class in C++ that would be a kind of std::vector<mystruct> with forbidden shrinking and for which the allocation of array1 and array2 would not be based on dynamical allocation for each entry? I do want to minimize the effect of memory allocation since K is very big.

OLD VERSION:

size_t KK = K * sizeof(mystruct);
size_t NN = N * sizeof(int);
struct mystruct * my_objects = (struct mystruct *) malloc(KK + 2*K*NN);
for(i=0; i<K; ++i)
{
    my_objects[i].size = N;
    my_objects[i].array1 = (int *) (my_objects + KK + 2*i*NN);
    my_objects[i].array2 = (int *) (my_objects + KK + (2*i+1)*NN);
}

Upvotes: 1

Views: 165

Answers (6)

Xirema
Xirema

Reputation: 20396

What you're trying to do can be done using the exact same code in c++.

However, it is utterly inadvisable in c++. The reason c++ has Object-Oriented semantics is to avoid the very situation you're reckoning with. Here's how I would handle this:

struct mystruct {
    vector<int> array1;
    vector<int> array2;
    mystruct(size_t size);
}

mystruct::mystruct(size_t size) {
    array1.resize(size);
    array2.resize(size);
}

int main() {
    vector<mystruct> mystructarray(numOfStructs, numOfElementsOfArray1AndArray2);
    //EDIT: You don't need to expressly call the mystruct constructor, it'll be implicitly called with the variable passed into the vector constructor.
    //Do whatever
    return 0;
}

vector objects can be queried for their size at runtime, so there's no need to store size as a field of mystruct. And since you can define constructors for structs, it's best to handle creation of the object in that way. Finally, with a valid constructor, you can initialize an array of mystruct with a vector, passing in a valid argument for mystruct's constructor to build the vector.

DOUBLE EDIT COMBO: Alright, let's try a different approach.

Based on what you indicated in your comments, it sounds like you need to allocate a LOT of memory. I'm thinking this data has specific meaning in your application, which means it doesn't make a lot of sense to use generic data structures for your data. So here's what I'm proposing:

class mydata {
private:
    size_t num_of_sets;
    size_t size_of_arrays;

    std::vector<int> data;

public:
    mydata(size_t _sets, size_t _arrays)
        : data(_sets * _arrays * 2),
        num_of_sets(_sets),
        size_of_arrays(_arrays) {}

    int * const array1(size_t);
    int * const array2(size_t);
};

int * const mydata::array1(size_t index)
{
    return &(data[index*size_of_arrays * 2]);
}

int * const mydata::array2(size_t index)
{
    return &(data[index*size_of_arrays * 2 + size_of_arrays]);
}

int main(int argc, char** argv) {
    mydata data(16'777'216, 10);

    data.array1(5)[5] = 7;
    data.array2(7)[2] = 8;

    std::cout << "Value of index 5's array1 at index 5: " << data.array1(5)[5] << std::endl;
    std::cout << "Value of index 7's array2 at index 2: " << data.array2(7)[2] << std::endl;
    //Do Something
    return 0;
}

Upvotes: -1

ams
ams

Reputation: 25579

If N and K are known at compile time, but can be different in different places, then a template will work:

template <int N, int K>
struct Memory {
  Memory() {
    for (int i=0; i < K; i++) {
      mystruct[i].array1 = data1[i];
      mystruct[i].array2 = data2[i];
      size[i] = N;
    }
  }

  struct mystruct {
    int * array1;
    int * array2;
    size_t size;
  } mystructs[K];

  int data1[K][N];
  int data2[K][N];
};

void f() {
  // The constructor sets up all the pointers.
  Memory *m<100,200> = new Memory<100,200>();

  .....
}

(I've not checked if that compiles.)

If the values are not known then I would not attempt to do this in one allocation; it makes more sense to do two allocations, one for an array of mystruct, and one for the integers. The extra overhead is minimal, and the code is much more maintainable.

struct Memory {
  Memory(int N, int K) {
    mystructs = new mystruct[K];
    data = new int[2*K*N];

    for (int i=0; i < K; i++) {
      array1[i] = &data1[2*i*N];
      array2[i] = &data2[(2*i+1)*N];
      size[i] = N;
    }
  }

  struct mystruct {
    int * array1;
    int * array2;
    size_t size;
  } mystruct *mystructs;

  int *data;
};

(Again, I've not checked that compiles.)

Note that where your code has 2*i*N*sizeof(int) you have a bug because C pointer arithmetic does not count bytes; it counts multiples of the pointer type. In my code I've made this explicit by taking the address of an array item, but the maths is the same.

Upvotes: 0

DevSolar
DevSolar

Reputation: 70263

Note: Solution created with minimal manual memory handling in mind, before OP edited in that his main requirement was performance due to a very large K. As std::vector still does behind-the-scenes memory allocations, this isn't a fast solution, just an elegant one.

Might be improved with a custom memory allocator, but I think @Simple's answer is better all-around, especially if encapsuled in a wrapper class.



struct MyStruct
{
    std::vector< int > array1;
    std::vector< int > array2;
    std::size_t size;

    MyStruct( std::size_t init_size ) :
        array1( std::vector< int >( init_size ) ),
        array2( std::vector< int >( init_size ) ),
        size( init_size )
    {}
};

// ...

std::vector< MyStruct > my_objects( K, N );

No dynamic memory allocation at all. (Well, not by you, anyway.)

Upvotes: 1

Florian Kaufmann
Florian Kaufmann

Reputation: 803

The following does two memory allocations, one for each vector. Naturally you have to ensure that the ints vector lives longer than mystructs vector, since mystructs's members refer to ints's members.

  struct mystruct
  {
    int* array1;
    int* array2;
    std::size_t size;
  };

  std::vector<int> ints(N*2*K);
  std::vector<mystruct> mystructs(K);
  for (std::size_t i=0; i<K; i++) {
    mystruct& ms = mystructs[i];
    ms.array1 = &ints[2*N*i];
    ms.array2 = &ints[2*N*i+1];
    ms.size = N;
  }

Update: As tp1 pointed out, std::vector might reseat its internal array, invalidating all pointers into it. If you never add or remove elements, that is not an issue. If you do, consider using std::deque instead for ints. However then you also have more memory allocations upon construction, see What really is a deque in STL?. Note that sadly C++ does not allow a const std::vector of non-const elements, see Const vector of non-const objects.

Upvotes: 1

Simple
Simple

Reputation: 14390

Here's my literal translation from C to C++ that maintains the same memory layout:

std::unique_ptr<int[]> const memory(new int[2 * K * N]);

std::vector<mystruct> my_objects;
my_objects.reserve(K);

for (int i = 0; i < K; ++i)
{
    mystruct const tmp = {N, memory + 2*i*NN, memory + (2*i+1)*NN};
    my_objects.push_back(tmp);
}

Upvotes: 4

SergeyA
SergeyA

Reputation: 62583

What you are doing here in C is that allocating an array externally to your struct, and than pointing pointers to the different parts of this array.

You can do exactly the same thing with std::vector<> - have a huge vector defined outside of your struct, and point pointers to different parts of this vector. Same thing exactly.

Upvotes: 0

Related Questions