user2343952
user2343952

Reputation: 23

efficient method to select index of vector in c++

In C++, suppose you have a vector with boolean values, and you want to select randomly one index among those corresponding to True values.

What is the most efficient method to use?

Example:

vector<bool> v(4);
v.at(0)=true
v.at(1)=false
v.at(2)=true
v.at(3)=true

You want to select a number among the subset {0,2,3}.

I have so far tried 2 methods:

  1. Stacking indexes in a vector and then selecting among these elements. Extremely slow.
  2. Naive method: randomly select a index until v.at(rnd_sel_index) is True. Considerably faster.

Any suggestions faster than method 2?

Upvotes: 2

Views: 1514

Answers (2)

Ryan Burn
Ryan Burn

Reputation: 2316

You can use the well-known method for selecting an element from a sequence of unknown length.

Example Code:

#include <random>                                                                                      
#include <iostream>                                                                                    
#include <vector>                                                                                      
#include <algorithm>                                                                                   
std::size_t choose_element(const std::vector<bool>& v) {                                               
  auto last     = v.end();                                                                             
  auto chosen_i = std::find(v.begin(), last, true);                                                    
  auto i        = std::find(std::next(chosen_i), last, true);                                          
  double n      = 2.0;                                                                                 
  static auto random_generator = std::mt19937{std::random_device{}()};                                 
  while (i != last) {                                                                                  
    if (std::bernoulli_distribution(1.0 / n)(random_generator))                                        
      chosen_i = i;                                                                                    
    i = std::find(std::next(i), last, true);                                                           
    ++n;                                                                                               
  }                                                                                                    
  return std::distance(v.begin(), chosen_i);                                                           
}                                                                                                      
int main() {                                                                                           
  std::vector<bool> v = {true, true, false, true};                                                     
  std::vector<int> indexes(v.size());                                                                  
  const double N = 100;                                                                                
  for (int i=0; i<N; ++i)                                                                              
    ++indexes[choose_element(v)];                                                                      
  for (auto& index : indexes)                                                                          
    std::cout << std::distance(indexes.data(), &index) << ": " << (index / N) << "\n";                 
  return 0;                                                                                            
}

This has predictable performance and only takes one pass through the data. Of course if you are taking multiple samples from the same vector it may be more efficient to restructure the data to a different format and then draw from that. Also, if nearly all of the elements are true, your method (2) might perform better in the average case.

Upvotes: 1

Richard Hodges
Richard Hodges

Reputation: 69854

Perhaps there's a more efficient approach.

Rather than storing what is there and what is not, perhaps it's better to store only what is not - i.e. a vector containing indices that are free.

the order of this vector can be easily randomised once, and you can then pull items from the back() until it's empty().

When you want to return items to the 'free index pool', simply insert them in a random position in the vector.

Upvotes: 3

Related Questions