user2980746
user2980746

Reputation: 401

Interacting with C arrays without knowing the size

So when we want to use an array without knowing the size then the common procedure is to start small and then keep doubling the size and reallocating it as we go along right?

And then I assume once we are done we will want to realloc once more to free up the excess memory that was malloced and we did not use?

#include <stdio.h>
#include <stdlib.h>

int main()
{
    // unknown = an unknown amount of work we need to do
    int unknown = 1234;

    // size = the current allocated memory
    int size = 10;

    // counter = the final size of the array
    int counter = 0;

    // first we allocate a small amount for our array
    int *array = (int *) malloc(size * sizeof(int));

    // and then start working
    for(int i = 0; i < unknown; i++)
    {
        // work
        array[i] = i; counter++;

        // check the size of the array to see if we need to realloc
        if (counter == size)
        {
            size *= 2;
            array = (int *) realloc(array, sizeof(size));
        }
    }

    // when all of the work is done we then shorten it to the exact size
    array = (int *) realloc(array, sizeof(counter));

    printf("%d", counter);
}

Question 1: Is this the most performant way of tackling this problem?

Question 2: Since we need to keep track of the size of the array do most people create a struct for this?

typedef struct list
{
    int size;
    int *array;
} list;

Upvotes: 0

Views: 139

Answers (3)

legendaryCoder69
legendaryCoder69

Reputation: 21

If you malloc'd the array. You can use platform specific methods to get the size of the array. On Windows you can use VirtualQuery with mbi.RegionSize to get the size.

Upvotes: 0

Lundin
Lundin

Reputation: 214770

the common procedure is to start small and then keep doubling the size and reallocating it as we go along right?

I'd say the most common approach is to start reasonably and then indeed keep doubling the size. Usually to a multiple of the CPU word alignment.

And then I assume once we are done we will want to realloc once more to free up the excess memory

No. Pre-allocating and doubling the size is an execution speed over memory optimization, since realloc calls are expensive. If you are using that approach, you have already decided that execution speed matters and memory use is less important.

Question 1: Is this the most performant way of tackling this problem?

You wouldn't check for sizes in the middle of iteration, but before it. Also repeated if checks in a loop are to be avoided if possible since that creates extra branching.

Instead check if new size fits in the allocated chunk, if not realloc, and then start the loop doing stuff on the data. From a program design perspective, allocation and algorithm should ideally not get mixed together in a tight-coupled mess, if that can be avoided.

And as mentioned, you wouldn't do a final realloc call either.

Question 2: Since we need to keep track of the size of the array do most people create a struct for this?

Yes and keep that struct private through the "opaque type" design pattern, hiding details about allocation away from the caller.

Upvotes: 2

asdfldsfdfjjfddjf
asdfldsfdfjjfddjf

Reputation: 431

Question 1

Your approach gives O(log(unknown)) calls to realloc and about O(unknown) of extra copying (10 + 20 + ... + unknown / 2 is a geometric series and approximately equal to unknown). This is not bad, but you can exploit that allocation is lazy, at least on Linux systems. You can start by allocating some upper bound (e.g. twice the RAM you have available), and then just fill your array. When you are done, you can realloc. This requires no extra copying and only one call to realloc. Allocating too much memory will only reserve virtual address space, it will not use RAM until you write to it. You could even consider not using any realloc at all as you have plenty of virtual address space and the realloc will likely copy your data.

Question 2

In most code I see, people just pass the size into functions explicitly, but I don't think there is anything wrong with using a struct. I would suggest to use

typedef struct list
{
    size_t size;
    int *array;
} list;

otherwise you are limited to an array of 8 GiB, assuming int is 32 bits.

Upvotes: 1

Related Questions