Arch1tect
Arch1tect

Reputation: 4281

Unordered_set iterator random

I read about an interview question from Google about designing a class that support fast insert, erase and erase random element. I'm thinking about unordered_set in cpp, insert and erase are already there. Then for remove random element, I think unordered_set 's begin() method points to a random element, and I can just grab its value and erase it from the set. Does this always work as erasing a random value from the set? Thanks!

Edit: Feel free to comment if you can think of some other data structure, doesn't have to be unordered_set.

Upvotes: 1

Views: 1334

Answers (1)

Milan Banković
Milan Banković

Reputation: 41

I do not think that taking the begin()'s value will be random enough. Probably it is better to do some randomization yourself. One way is to randomly choose a bucket from the hash table and to take the begin()'s value of that bucket:

#include <unordered_set>
#include <random>

// Assume that T is some arbitrary type for which std::hash<T> is defined
std::unordered_set<T> myset; 

// put some elements into the set

unsigned bcount = myset.bucket_count(); // get the number of buckets
std::mt19937 rng(time(0)); // random number generator (seeded with time(0))

// returns a number in [0, bcount - 1]
uniform_int_distribution<unsigned> d(0, bcount - 1); 

// returns a random bucket index
unsigned rbucket = d(rng); 

// returns the beginning element of the selected bucket
auto it = myset.begin(rbucket); 
myset.erase(it); // removes the selected element

This is certainly more random than taking the begin()'s value, but still not uniform, since the buckets' beginning elements are preferred. If you want to guarantee the uniform distribution over the entire container, you can simply take a random value r in [0, myset.size()-1], and to iterate through the set to reach that element:

#include <unordered_set>
#include <random>

// Assume that T is some arbitrary type for which std::hash<T> is defined
std::unordered_set<T> myset;

// put some elements into the set

std::mt19937 rng(time(0)); // random number generator (seeded with time(0))
uniform_int_distribution<unsigned> d(0, myset.size() - 1); 

// returns a random number from [0, myset.size() - 1]
unsigned r = d(rng); 

// iterates through the container to the r-th element
auto it = myset.begin();
for(; it != myset.end() && r > 0; ++it, r--);
myset.erase(it); // erasing the selected element

This removes an element with the (pseudo)uniform probability, but it is not so efficient, since it requires iteration through the container. I think that you cannot do better than that by using std::unordered_set.

Upvotes: 2

Related Questions