Heychsea
Heychsea

Reputation: 133

What is the time complexity of iterating BTreeSet and HashSet?

For example, with HashSet, I know that getting one known element is usually O(1), but I want to find what is the time complexity for getting all elements (without knowing them, so an iteration).

I can't find this information anywhere in the standard library's documentation. I have also looked at SwissTable, without success.

Is it even measurable? Where can I find it?

Upvotes: 7

Views: 6341

Answers (3)

Matthieu M.
Matthieu M.

Reputation: 300129

TL;DR:

  • BTreeSet: O(N)
  • HashSet: O(capacity)

BTreeSet

The B-Tree data-structure is a Tree of Arrays of K elements, for some value of K.

The depth of the Tree is O(log N), and nodes are merged together when their arrays are not full enough. For our case, we can use the rule that a node is necessarily at least half-full, although any constant works.

In general, iteration is done from smallest to largest, which is an in-order traversal. This implies that moving from element to the next is not strictly O(1), indeed, moving from the right-most element of the left sub-tree to the root implies O(log N) steps.

It can be shown that the amortized complexity is O(1), and this leads to O(N) overall traversal complexity.

HashSet

There is no general iteration complexity for hash maps, or hash sets; it varies by implementation.

The implementation in Rust is an open-ended hash-table, essentially. This means a very large array of K elements (K = capacity), more or less sparsely populated.

As with most open-ended hash tables, there is no short-circuit to iteration. Instead, each element of the array is checked in turn.

The iteration time is thus proportional to the capacity, regardless of the number of elements. On a sparsely populated hash-table, that's quite expensive.

Note: the Swiss table uses a variation of open-ended hash-tables, this does not affect the fundamental properties of the various operations.

Upvotes: 10

Emerson Harkin
Emerson Harkin

Reputation: 917

If I understood your question, you're asking how much time it takes to visit every item in a collection in no particular order. For any collection of n items, the best case is Omega(n) because you can't retrieve an item in less than one operation. Conversely, as long as you can retrieve the next item in a collection in a constant (or constant on average) number of operations, the worst case is O(n).

In principle, it's possible to do much worse than O(n) if you really try. For example, you could iterate over a HashMap containing n items by trying each of m > n keys, so that the complexity would be O(m) instead of O(n).


If you're really worried that iteration for a particular collection was implemented naively, for now it seems like the only way to know is to go digging through the source code. Following the bread-crumbs in HashMap, for example, eventually leads to this method which is used to iterate over the contents of this struct, but it's a bit difficult to interpret if (like me) you aren't really familiar with all of the implementation details.

Upvotes: 2

maio290
maio290

Reputation: 6742

Currently, our implementation simply performs naive linear search. This provides excellent performance on small nodes of elements which are cheap to compare. However in the future we would like to further explore choosing the optimal search strategy based on the choice of B, and possibly other factors. Using linear search, searching for a random element is expected to take O(B * log(n)) comparisons, which is generally worse than a BST. In practice, however, performance is excellent.

Source: BTreeMap referenced from here.

From this reference, I'd assume that HashSet is more or less equal to HashMap:

The default hashing algorithm is currently SipHash 1-3, though this is subject to change at any point in the future. While its performance is very competitive for medium sized keys, other hashing algorithms will outperform it for small keys such as integers as well as large keys such as long strings, though those algorithms will typically not protect against attacks such as HashDoS.

Source: HashMap

Since this doesn't state anything specific, I'd assume that O(1) should apply most of the time. This thread has (although for Java) some very good answers.

In very simple words: the complexity of an algorithm is defined by looking at the source code. For a two-dimensional array, the runtime (without doing anything in the inner loop) would be because you'd have two loops running n-times each:

for(int i = 0; i<arr.length; i++)
{
    for(int j = 0; j<arr[0].length; j++)
    {
        // do something
    }
}

For further reference, you may check out the Wikipedia article on Big O notation.

Upvotes: 0

Related Questions