Reputation: 1060

Linux really allocating memory it shoudn't in C++ code

In Linux, the kernel doesn't allocate any physical memory pages until we actually using that memory, but I am having a hard time here trying to find why it does in fact allocate this memory:

   for(int t = 0; t < T; t++){
      for(int b = 0; b < B; b++){
         Matrix[t][b].length = 0;
         Matrix[t][b].size = 60;
         Matrix[t][b].pointers = (Node**)malloc(60*sizeof(Node*)); 
         }
   }

I then access this data structure to add one element to it like this:

   Node* elem = NULL;
   Matrix[a][b].length++;
   Matrix[a][b]->pointers[ Matrix[a][b].length ] = elem;

Essentially, I run my program with htop on the side and Linux does allocate more memory if I increase the no. "60" I have in the code above. Why? Shouldn't it only allocate one page when the first element is added to the array?

Upvotes: 2

Answers (3)

jschultz410

Reputation: 2899

Here's a sketch of what you could do if you typically expect that your b arrays will usually be small, usually be less than 2^X pointers (X = 5 in the code below), but also handles exceptional cases where they get even bigger.

You can adjust X down if your expected usage doesn't match. You could also adjust the minimum size arrays up from 0 (and not allocate the smaller 2^i levels), if you expect most of your arrays will usually use at least 2^Y pointers (e.g. - Y = 3).

If you think that actually X == Y (e.g. - 4) for your usage pattern, then you can just do one allocation of B * (0x1 << X) * sizeof(Node*) and divvy up that T array to your b's. Then if a b array needs to exceed 2^X pointers, then resort to malloc for it followed by realloc's if it needs to grow even further.

The main point here is that the initial allocation will map to very little physical memory, addressing the problem that initially spurred your original question.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define T           1278
#define B           131072
#define CAP_MAX_LG2 5        
#define CAP_MAX     (0x1 << CAP_MAX_LG2)  // pre-alloc T's to handle all B arrays of length up to 2^CAP_MAX_LG2

typedef struct Node Node;

typedef struct
{
  int    t;                               // so a matrix element can know to which T_Allocation it belongs
  int    length;
  int    cap_lg2;                         // log base 2 of capacity; -1 if capacity is zero
  Node **pointers;

} MatrixElem;

typedef struct
{
  Node **base;                            // pre-allocs B * 2^(CAP_MAX_LG2 + 1) Node pointers; every b array can be any of { 0, 1, 2, 4, 8, ..., CAP_MAX } capacity
  Node **frees_pow2[CAP_MAX_LG2 + 1];     // frees_pow2[i] will point at the next free array of 2^i pointers to Node to allocate to a growing b array

} T_Allocation;

MatrixElem   Matrix[T][B];
T_Allocation T_Allocs[T];

int  Node_init(Node *n) { return 0; } // just a dummy
void Node_fini(Node *n) { }           // just a dummy 
int  Node_eq(const Node *n1, const Node *n2)  { return 0; } // just a dummy

void Init(void)
{
  for(int t = 0; t < T; t++) 
  {
    T_Allocs[t].base = malloc(B * (0x1 << (CAP_MAX_LG2 + 1)) * sizeof(Node*));

    if (NULL == T_Allocs[t].base)
      abort();

    T_Allocs[t].free_pows2[0] = T_Allocs[t].base;

    for (int x = 1; x <= CAP_MAX_LG2; ++x)
      T_Allocs[t].frees_pow2[x] = &T_Allocs[t].base[B * (0x1 << (x - 1))];

    for(int b = 0; b < B; b++)
    {
      Matrix[t][b].t        = t;
      Matrix[t][b].length   = 0;
      Matrix[t][b].cap_lg2  = -1;
      Matrix[t][b].pointers = NULL;
    }
  }
}

Node *addElement(MatrixElem *elem)
{
  if (-1 == elem->cap_lg2 || elem->length == (0x1 << elem->cap_lg2))  // elem needs a bigger pointers array to add an element
  {
    int new_cap_lg2 = elem->cap_lg2 + 1;
    int new_cap     = (0x1 << new_cap_lg2);

    if (new_cap_lg2 <= CAP_MAX_LG2)            // new b array can still fit in pre-allocated space in T
    {
      Node **new_pointers = T_Allocs[elem->t].frees_pow2[new_cap_lg2];

      memcpy(new_pointers, elem->pointers, elem->length * sizeof(Node*));
      elem->pointers = new_pointers;

      T_Allocs[elem->t].frees_pow2[new_cap_lg2] += new_cap;
    }
    else if (elem->cap_lg2 == CAP_MAX_LG2)     // exceeding pre-alloc'ed arrays in T; use malloc
    {
      Node **new_pointers = malloc(new_cap * sizeof(Node*));

      if (NULL == new_pointers)
        return NULL;

      memcpy(new_pointers, elem->pointers, elem->length * sizeof(Node*));
      elem->pointers = new_pointers;
    } 
    else                                       // already exceeded pre-alloc'ed arrays in T; use realloc
    {
      Node **new_pointers = realloc(elem->pointers, new_cap * sizeof(Node*));

      if (NULL == new_pointers)
        return NULL;

      elem->pointers = new_pointers;
    }

    ++elem->cap_lg2;
  }

  Node *ret = malloc(sizeof(Node);

  if (ret)
  {
    Node_init(ret);
    elem->pointers[elem->length] = ret;
    ++elem->length;
  }

  return ret;
}

int removeElement(const Node *a, MatrixElem *elem)
{
  int i;

  for (i = 0; i < elem->length && !Node_eq(a, elem->pointers[i]); ++i);

  if (i == elem->length)
    return -1;

  Node_fini(elem->pointers[i]);
  free(elem->pointers[i]);
  --elem->length;
  memmove(&elem->pointers[i], &elem->pointers[i+1], sizeof(Node*) * (elem->length - i));

  return 0;
}

int main()
{
  return 0;
}

Upvotes: 0

jschultz410

Reputation: 2899

Your assumption that malloc / new doesn't cause any memory to be written, and therefore assigned physical memory by the OS, is incorrect (for the memory allocator implementation you have).

I've reproduced the behavior you are describing in the following simple program:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv)
{
  char **array[128][128];
  int    size;
  int    i, j;

  if (1 == argc || 0 >= (size = atoi(argv[1])))
    fprintf(stderr, "usage: %s <num>; where num > 0\n", argv[0]), exit(-1);

  for (i = 0; i < 128; ++i)
    for (j = 0; j < 128; ++j)
      if (NULL == (array[i][j] = malloc(size * sizeof(char*))))
      {
        fprintf(stderr, "malloc failed when i = %d, j = %d\n", i, j);
        perror(NULL);
        return -1;
      }

  sleep(10);

  return 0;
}

When I run this with various small size parameters as input, the VIRT and RES memory footprints (as reported by top) grow together in-step, even though I'm not explicitly touching the inner arrays that I'm allocating.

This basically holds true until size exceeds ~512. Thereafter, RES stays constant at 64 MiB while VIRT can be extremely large (e.g. - 1220 GiB when size is 10M). That is because 512 * 8 = 4096, which is a common virtual page size on Linux systems, and 128 * 128 * 4096 B = 64 MiB.

Therefore, it looks like the first page of every allocation is being mapped to physical memory, probably because malloc / new itself is writing to part of the allocation for its own internal book keeping. Of course, lots of small allocations may fit in and be placed on the same page, so only one page gets mapped to physical memory for many such allocations.

In your code example, changing the size of the array matters because it means less of those arrays can be fit on one page, therefore requiring more memory pages to be touched by malloc / new itself (and therefore mapped to physical memory by the OS) over the run of the program.

When you use 60, that takes about 480 bytes, so ~8 of those allocations can be put on one page. When you use 100, that takes about 800 bytes, so only ~5 of those allocations can be put on one page. So, I'd expect the "100 program" to use about 8/5ths as much memory as the "60 program", which seems to be a big enough difference to make your machine start swapping to stable storage.

If each of your smaller "60" allocations were already over 1 page in size, then changing it to be bigger "100" wouldn't affect your program's initial physical memory usage, just like you originally expected.

PS - I think whether you explicitly touch the initial page of your allocations or not will be irrelevant as malloc / new will have already done so (for the memory allocator implementation you have).

Upvotes: 2

jschultz410

Reputation: 2899

It depends on how your Linux system is configured.

Here's a simple C program that tries to allocate 1TB of memory and touches some of it.

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main()
{
  char *array[1000];
  int i;

  for (i = 0; i < 1000; ++i)
  {
    if (NULL == (array[i] = malloc((int) 1e9)))
    {
      perror("malloc failed!");
      return -1;
    }

    array[i][0] = 'H';
  }

  for (i = 0; i < 1000; ++i)
    printf("%c", array[i][0]);

  printf("\n");

  sleep(10);

  return 0;
}

When I run top by its side, it says the VIRT memory usage goes to 931g (where g means GiB), while RES only goes to 4380 KiB.

Now, when I change my system to use a different overcommit strategy by /sbin/sysctl -w vm.overcommit_memory=2 and re-run it, I get:

malloc failed!: Cannot allocate memory

So your system may be using a different overcommit strategy than you expected. For more information read this.

Upvotes: 7

Linux really allocating memory it shoudn&#39;t in C++ code

Answers (3)

Related Questions

Linux really allocating memory it shoudn't in C++ code