Stein
Stein

Reputation: 1639

Can I pop from a HashSet efficiently?

My algorithm needs to iteratively shrink a set by removing an element, and do something with the element removed and with the shrinking set in each iteration. And:

By the way, the algorithm is the basic form of the Bron–Kerbosch algorithm. Smarter versions of that algorithm work faster (mostly) because they don't leave the choice of element arbitrary and I'd like to learn how much that effort pays off, compared to optimizing the pop operation.

Python sets have a pop member that pretty much does that. In Scala and Go, picking and removing the "first" element of a hash set seems to work fine (where "first" corresponds to the iterator). In Rust, that is something like:

// split off an arbitrary element from a (non-empty) set
pub fn pop<T>(set: &mut HashSet<T>) -> T
where
    T: Eq + Clone + std::hash::Hash,
{
    let elt = set.iter().next().cloned().unwrap();
    set.remove(&elt);
    elt
}

This seems to be a performance bottleneck compared to other languages, but even to a seemingly naive way do this kind of iteration in Rust: copying the sequence, then popping elements in sequence. I benchmarked some implementations of a pop-like function on the playground but none perform well compared to the naive way.

At first, I thought I saw that removing an element is not expensive, but picking one with iter().next() is. But on closer examination, that's not true, at least compared to other languages (*).

Using retain understandably doesn't help: it always iterates the whole set. Are there any other alternatives?

(*) On closer examination, iter().next() is quite cheap, as far as microbenchmarking can be trusted. Separate microbenchmarks say that picking an arbitrary element from a set costs (in nanoseconds on my system):

| Type of set      | Number of elements in set instance
|                  | 100 | 10,000 | 1,000,000
| Rust HashSet     |   2 |      2 |         2
| Rust BTreeSet    |  11 |     12 |        13
| Go map[]struct{} |  27 |     31 |        94
| Python set       | 125 |    125 |       125

Upvotes: 16

Views: 9561

Answers (3)

Laney
Laney

Reputation: 1649

Your code can be simplified a bit:

let elt = set.iter().next().cloned().unwrap();
set.take(&elt).unwrap()

If you want to remove all elements from a HashSet then you should use the drain iterator - it is very efficient.

HashSet from the Rust standard library is not that fast. Try replacing it with one from the hashbrown crate.

Upvotes: 1

Shepmaster
Shepmaster

Reputation: 430544

the set I'm using has integers

Don't use a HashSet; A BTreeSet has better and more consistent performance.

For N = 100000...

BTreeSet

sequenced : 3065.098µs
pop_1     : 2941.876µs
pop_2     : 2927.429µs

HashSet

sequenced : 3091.454µs
pop_1     : 172547.080µs
pop_2     : 807182.085µs

Upvotes: 7

Stein
Stein

Reputation: 1639

I guess the same advice applies as in Can I randomly sample from a HashSet efficiently?: copy the set as a vector just to iterate over it as shown in the "sequenced" solution in the benchmark:

let seq: Vec<u32> = set.iter().cloned().collect();
for elt in seq {
    set.remove(&elt);

That means this answer is not applicable if you need to shrink the set (pick an arbitrary element) only once or a few times, or if the set contents cannot be cheaply cloned.

Upvotes: 3

Related Questions