Initializing an Array in the Context of Studying Data Structures

Question

I am reading CLRS's Introduction to Algorithms and there is question 11.1 Exercise 4 in the book under the section Direct-Address Tables :

We wish to implement a dictionary by using direct addressing on a huge array. At
the start, the array entries may contain garbage, and **initializing** the entire array
is impractical because of its size. Describe a scheme for implementing a direct address
dictionary on a huge array. Each stored object should use O(1) space;
the operations SEARCH, INSERT, and DELETE should take O(1) time each; and
initializing the data structure should take O(1) time. (Hint: Use an additional array,
treated somewhat like a stack whose size is the number of keys actually stored in
the dictionary, to help determine whether a given entry in the huge array is valid or
not.)

I understand the solution is just to create another array, and have it store pointers to this array for elements that exist.

But I'm slightly confused as to the meaning of "initialize" in this context. If the array is not initialized, how can we even access the data (i.e. get the value at the i-th position with A[i])?

I'm also not sure why the question states this memory constraint. Suppose we could initialize the array, how would the answer change?

Matt Timmermans · Accepted Answer

The problem is that initializing an array of length N -- setting all the elements to a known value like NULL -- takes O(N) time.

If you have an array that is initialized to NULL, then implementing a direct access table is super easy -- A[i] == NULL means there is no value for i, and if there is a value for i, then it's stored in A[i].

The question is about how to avoid the O(N) initialization cost. If the array is not initialized, then the initial values for all A[i] could be anything at all... so how do you tell if it's a real value or just the initial garbage?

The solution is not just to create another array that stores pointers to the original -- you would have to initialize that other array and then you've wasted O(N) time again.

To avoid that cost altogether, you have to be more clever.

Make 3 arrays A, B, and C, and keep a count N of the total number of values in the dictionary.

Then, if the value for i is v:

A[i] = v;
0 <= B[i] < N; and
C[B[i]] = i

This way, the B and C arrays let you keep track of which indexes in A have been set to a real value, without initializing any of the arrays. When you add a new item, you check conditions (2) and (3) to see if the index valid, and if it isn't, then you do:

A[i] = NULL
B[i] = N
C[N++] = i

This marks index i as valid, and conditions (2) and (3) will then pass for all future checks.

Because of the amount of memory it takes, this technique isn't often used in practice, BUT it does mean that theoretically, you never have to count the cost of array initialization when calculating run time complexity.

Initializing an Array in the Context of Studying Data Structures

Answers (2)

Related Questions