Reputation: 401
So when we want to use an array without knowing the size then the common procedure is to start small and then keep doubling the size and reallocating it as we go along right?
And then I assume once we are done we will want to realloc once more to free up the excess memory that was malloced and we did not use?
#include <stdio.h>
#include <stdlib.h>
int main()
{
// unknown = an unknown amount of work we need to do
int unknown = 1234;
// size = the current allocated memory
int size = 10;
// counter = the final size of the array
int counter = 0;
// first we allocate a small amount for our array
int *array = (int *) malloc(size * sizeof(int));
// and then start working
for(int i = 0; i < unknown; i++)
{
// work
array[i] = i; counter++;
// check the size of the array to see if we need to realloc
if (counter == size)
{
size *= 2;
array = (int *) realloc(array, sizeof(size));
}
}
// when all of the work is done we then shorten it to the exact size
array = (int *) realloc(array, sizeof(counter));
printf("%d", counter);
}
Question 1: Is this the most performant way of tackling this problem?
Question 2: Since we need to keep track of the size of the array do most people create a struct for this?
typedef struct list
{
int size;
int *array;
} list;
Upvotes: 0
Views: 139
Reputation: 21
If you malloc'd the array. You can use platform specific methods to get the size of the array. On Windows you can use VirtualQuery
with mbi.RegionSize
to get the size.
Upvotes: 0
Reputation: 214770
the common procedure is to start small and then keep doubling the size and reallocating it as we go along right?
I'd say the most common approach is to start reasonably and then indeed keep doubling the size. Usually to a multiple of the CPU word alignment.
And then I assume once we are done we will want to realloc once more to free up the excess memory
No. Pre-allocating and doubling the size is an execution speed over memory optimization, since realloc
calls are expensive. If you are using that approach, you have already decided that execution speed matters and memory use is less important.
Question 1: Is this the most performant way of tackling this problem?
You wouldn't check for sizes in the middle of iteration, but before it. Also repeated if
checks in a loop are to be avoided if possible since that creates extra branching.
Instead check if new size fits in the allocated chunk, if not realloc, and then start the loop doing stuff on the data. From a program design perspective, allocation and algorithm should ideally not get mixed together in a tight-coupled mess, if that can be avoided.
And as mentioned, you wouldn't do a final realloc call either.
Question 2: Since we need to keep track of the size of the array do most people create a struct for this?
Yes and keep that struct private through the "opaque type" design pattern, hiding details about allocation away from the caller.
Upvotes: 2
Reputation: 431
Question 1
Your approach gives O(log(unknown))
calls to realloc
and about O(unknown)
of extra copying (10 + 20 + ... + unknown / 2 is a geometric series and approximately equal to unknown). This is not bad, but you can exploit that allocation is lazy, at least on Linux systems. You can start by allocating some upper bound (e.g. twice the RAM you have available), and then just fill your array. When you are done, you can realloc. This requires no extra copying and only one call to realloc
. Allocating too much memory will only reserve virtual address space, it will not use RAM until you write to it.
You could even consider not using any realloc
at all as you have plenty of virtual address space and the realloc
will likely copy your data.
Question 2
In most code I see, people just pass the size into functions explicitly, but I don't think there is anything wrong with using a struct. I would suggest to use
typedef struct list
{
size_t size;
int *array;
} list;
otherwise you are limited to an array of 8 GiB, assuming int
is 32 bits.
Upvotes: 1