Reputation: 52338
I'm writing an implementation of a min-heap
in C as part of Dijkstra's Algorithm. I've got all of the specifics down and my test program passes valgrind tests, but it allocates ridiculous amounts of memory in the process. The final test is on a grid of INT_MAX
by INT_MAX
(coordinates are just integers), and I get SIGXCPU
errors when I test. Even when I just insert 16k
positions into the queue then delete everything, it still takes a very long time and allocates over 8 MB
. When I run it on the huge grid test cases, it can get up to 500 MB
before I quit manually. What could be going on? Here is part of my code:
struct position {
int x;
int y
};
typedef struct elt {
int priority;
int distance;
struct position p;
} *Elt;
typedef struct heap {
int size;
int capacity;
Elt *elts;
} *Heap;
void heap_insert(Heap h, Elt e, int *counter) {
if(h->capacity < (h->size + 2)) {
h->elts = realloc(h->elts, h->capacity * sizeof(Elt) * 2);
h->capacity *= 2;
}
h->elts[h->size] = malloc(sizeof(*Elt));
elt_assign(h->elts[h->size], e);
h->size++;
heapify(h->size, h->elts);
*counter = *counter + 1;
}
All of my other functions do memory management one-time, in-function, or not at all. The initial size in this case was 64
, but I got the same effect starting at 1024
. I also tried limiting the size of the queue, to no avail. I'm pretty sure it's not my heapifying code, but this is it just in case
static void floatDown(int n, Elt *a, int pos) {
Elt x = malloc(sizeof(struct elt));
elt_assign(x, a[pos]);
for(;;) {
if(Child(pos, 1) < n && a[Child(pos, 1)]->priority < a[Child(pos, 0)]->priority) {
if(a[Child(pos, 1)]->priority < x->priority) {
elt_assign(a[pos], a[Child(pos, 1)]);
pos = Child(pos, 1);
} else {
break;
}
} else if(Child(pos, 0) < n && a[Child(pos, 0)]->priority < x->priority) {
elt_assign(a[pos], a[Child(pos, 0)]);
pos = Child(pos, 0);
} else {
break;
}
}
elt_assign(a[pos], x);
free(x);
}
static void heapify(int n, Elt *a) {
for(int i = n - 1; i >= 0; i--) {
floatDown(n, a, i);
}
}
Any help would be much appreciated.
Upvotes: 4
Views: 502
Reputation: 4290
This is my working theory. I am willing to discover I am wrong, but without the rest of the code, I can't instrument, run and test it.
The indirection of ... struct heap { ... Elt *elts; } ...
when typedef struct elt {...} *Elt;
is saving the cost of copying 4 ints and replacing it with copying 1 pointer, but copying is fast, and it only happens log2(N) times.
Instead, every struct elt
is malloc'd individually. Without digging around to find the actual size of a malloc'd block, we can estimate that, on average that will waste N/2 sizeof(struct elt) (actually, I think it is worse on my machine).
It may also create discontinuous blocks of memory (by putting small blocks between larger blocks), so that realloc must always allocate a bigger block, so it would be harder to reuse previous blocks. In this specific case, I don't think that matters as much as the waste due to internal fragmentation, or the large number of calls of malloc.
It might also create a 'cache buster'. The actual values are being spread throughout memory, and a cache line is relatively sparse because of the internal fragmentation of the malloc'd struct elt blocks.
So Replace:
typedef struct elt {
int priority;
int distance;
struct position p;
} *Elt;
typedef struct heap {
int size;
int capacity;
Elt *elts;
} *Heap;
with
typedef struct elt {
int priority;
int distance;
struct position p;
} Elt; // no longer a pointer
typedef struct heap {
int size;
int capacity;
Elt *elts;
} *Heap;
and change:
void heap_insert(Heap h, Elt e, int *counter) {
if(h->capacity < (h->size + 2)) {
h->elts = realloc(h->elts, h->capacity * sizeof(Elt) * 2);
h->capacity *= 2;
}
h->elts[h->size] = malloc(sizeof(*Elt));
elt_assign(h->elts[h->size], e);
h->size++;
heapify(h->size, h->elts);
*counter = *counter + 1;
}
to
void heap_insert(Heap h, Elt e, int *counter) {
if(h->capacity < (h->size + 2)) {
h->elts = realloc(h->elts, h->capacity * sizeof(Elt) * 2);
h->capacity *= 2;
}
h->elts[h->size] = e; // no longer need to malloc
h->size++;
heapify(h->size, h->elts);
*counter = *counter + 1;
}
So the amount of memory malloc'd/realloc'd to hold the heap should be roughly 2 * N * sizeof(struct elt). The function/macro elt_assign can likely be changed to hide other changes.
Then reduce the amount of malloc'ing further by changing:
static void floatDown(int n, Elt *a, int pos) {
Elt x = malloc(sizeof(struct elt));
elt_assign(x, a[pos]);
...
elt_assign(a[pos], x);
free(x);
}
to
static void floatDown(int n, Elt *a, int pos) {
Elt x = a[pos];
...
a[pos] = x;
}
That should further reduce the amount of memory malloc'ed and free'd.
Essentially, there should only be (approximately) log2(N) calls of realloc. There may also be a better chance that realloc just extends the existing block rather than copies.
Edit:
There is a bigger problem in heap_insert
than memory allocation:
void heap_insert(Heap h, Elt e, int *counter) {
...
heapify(h->size, h->elts);
...
}
heapify
is called for every element inserted into the heap, i.e. heapify is called N times. heapify
is:
static void heapify(int n, Elt *a) {
for(int i = n - 1; i >= 0; i--) {
floatDown(n, a, i);
}
}
That calls floatdown
on every element in the heap so far, for each element inserted. So heap_insert
has run-time approximately (N^2)/2 (i.e. O(N^2)) run time.
I believe heap_insert
should be using floatDown
for each element it adds to the heap, not heapify
.
Upvotes: 2