jkeys
jkeys

Reputation: 3955

How can standards guarantee a data structure uses contiguous memory?

I was just wondering how exactly memory works such that a language standard (such as C++'s ISO/ANSI standard) can guarantee that any data structure (even an array) will be contiguous.

I don't even know how to write a data structure using continguous memory, but could you please give me a short example of how designers can do this?

For instance, assuming std::vector from C++ allocates all of its memory at runtime, how would it know that the memory slots ahead of the current allocated memory is not in use (and thusly, free for use by the vector)? Do vectors just look quite far ahead and hope the user doesn't try and push_back so many objects that it can no longer store it in a contiguous memory block? Or does the operating system move around memory at will to keep this from becoming a problem (no idea how this would work)?

Upvotes: 2

Views: 899

Answers (3)

Martin Liversage
Martin Liversage

Reputation: 106916

Your question seems to be about how to understand the concept of memory allocation from a beginners point of view. Let me try to explain what is going on in a very simplified manner. As an example we can think of a C++ program that adds a lot of elements to a std::vector.

When the program starts the C++ runtime will call the operating system to allocate some memory. This piece of memory is called the heap and it is used when dynamic memory is required by the C++ program. Initially the heap is mostly unused, but calls to new and malloc will carve out blocks of memory on the heap. Internally the heap uses some bookkeeping information to keep track of the used and free ares of the heap.

Exactly how std::vector behaves internally depends on the implementation, but in general it will allocate a buffer for the elements of the vector on the heap. This buffer is big enough to accommodate all the elements in the vector, but most of the time it has some free space at the end. Here is a buffer that stores 5 elements and has space enough for 8 elements. The buffer is located at address 1000 on the heap.

1000: X X X X X _ _ _

The std::vector keeps track of both the number of elements in the vector (5) and the size (8) and location (1000) of the buffer.

Here is the buffer after push_back is called to add a new element to the vector:

1000: X X X X X X _ _

That can be done two more times until all space has been used in the buffer.

1000: X X X X X X X X

But what happens if push_back is called once more? The vector has to increase the size of the buffer. The buffer is allocated on the heap and if the area right after the buffer is unused it may actually be possible to simply grow the buffer. However, most of the time the memory has been allocated to some other object. This is something the heap keeps track of. For the vector to be able to grow the buffer it has to allocate a completely new buffer with an increased size. Many implementations will simply double the size of the buffer. Here is the new buffer that now stores 9 elements and has room for 16 elements. The new buffer is allocated at address 2000 on the heap:

2000: X X X X X X X X X _ _ _ _ _ _ _

The contents of the old buffer is copied to the new buffer, and this operation can be costly if the buffer is big.

In case you wonder the heap may also grow while the program is running just as individual blocks allocated on the heap may grow. This will increase the memory consumption of the program. As more and more elements are added to the vector the heap will have to grow until the operating system refuses to increase the size of the heap. When that happens the program will fail with an out of memory condition.

To sum up:

  • The operating system supplies memory for the heap that can grow in size until the limits of the operating system has been reached.
  • The memory allocation routines in C++ can allocate and free blocks of fixed size on the heap.
  • The std::vector will preallocate a buffer to allow the vector to grow, but if the vector grows beyond the size of the buffer it will allocate a new buffer and copy the entire contents of the vector to this new buffer.

Upvotes: 3

Sean
Sean

Reputation: 4470

One thing to remember is that all modern operating systems have something called Virtual Memory. This allows any program the ability to assume that it has complete access to the largest possible amount of memory for that architecture, regardless of other processes, or even the amount of physical RAM in the computer. So when compiling, the compiler can automatically assign enough space on the stack (if the size needed is known at compile-time), or it can write the assembly to dynamically allocate memory on the heap as if the entire memory space were available. It's a level of abstraction that makes things incredibly easier.

What the OS does is take care of swapping the pages in and out of physical memory when they are needed for the various running programs. That way, no program needs to worry about the others, but you can still have simultaneously running programs.

I am of course vastly simplifying what is really going on, but I think this at least answers your questions on how the compiler can tell what memory is free; because it can assume that all memory starts out empty.

Upvotes: 1

Alex Martelli
Alex Martelli

Reputation: 882481

The standards may demand that a compliant implementation behave in a certain way -- they can "guarantee" only that compliant implementations behave that way, because, by definition, an implementation that violates a requirement in the standard is NOT compliant.

How a typical implementation of std::vector complies with the contiguous-memory requirement: it allocates a certain amount of space ("capacity"), and if the current size ever catches up with the capacity, it reallocates the space (which amounts to doing a completely new allocation, typically some fixed multiple of the current capacity, copying the current contents to the new space, and releasing the previously obtained space).

Upvotes: 3

Related Questions