Reputation: 852

Find first value not in std::set<int> min-max

How do I find the first value h that is not in std::set<int> ids such that result is clamped to [0, *ids.rbegin() + 1] range.

I see this is rather simple problem, but I didn't find any matching question yet. Basically I want the inverted set of ids so that I can use them.

So far I have following:

#incldue <set>
std::set<int> ids;
int h = 0;
for(; !ids.empty() && (h <= *ids.rbegin() + 1); ++h) {
    if(!ids.count(h)) {
        break;
    }
}
// h is now smallest value not in ids.

I suspect this be improved further such that e.g the loop is not required?

@edit: to clarify what values are in the set: In my use case the value generated by the algorithm is inserted into the set. I should have really said std::set<unsigned int>. I'm happy to so much discussion done on this question!

Upvotes: 1

Answers (7)

Yakk - Adam Nevraumont

Reputation: 275500

std::set is not optimized for this problem.

Naive approaches both give you O(n) performance and bad O(n) performance because you are walking a node based datastructure.

What you want is a sorted "leapable" datastructure (be it some kind of tree, or a skip list) where the size of the leaps is recorded.

Then you can track the delta-id and the size of the leap; if the leap is smaller than the id difference, there is an empty id in there. If not, there is no empty id in there.

None of the associative containers in std keep track of the information you need, and retrofitting them isn't practical as you are not given access to the "leap" based iteration or structure.

With such a datastructure, insert and remove is O(lgn) as is "find first unused id". Without it, find first unused id is O(n).

That involves some heavy codesmithing. We can do almost as good, with higher constant costs, by simply storing a set of ranges and wrapping it to guarantee that the ranges don't overlap.

struct range {
  int base = 0;
  int length = 0;
  friend bool operator<( range lhs, range rhs ) {
    if (lhs.base != rhs.base) return lhs.base < rhs.base;
    return lhs.length < rhs.length;
  }
  bool operator()(int x) const {
    ERROR( if (length < 0) std::cout << "bad length\n"; )
    ERROR( if (base < 0) std::cout << "bad base\n"; )
    return (x>=base) && (x<(base+length));
  }
};

range is a half-open interval going from [base, base+length). Thus [x,0) is an empty range for all x and [x,1) just contains x.

It is ordered by base. If you ask for the lower bound of an ordered collection on [x,0)

Now we make a std::set<range> and wrap it up:

struct id_set {
  bool taken(int x) const;
  int find_unused() const;
  void take(int x);
  void recycle(int x);
  int take_unused() {
    auto r = find_unused();
    take(r);
    return r;
  }
  std::size_t count() const {
    std::size_t r = 0;
    for (auto e:state)
      r += e.length;
    ERROR( if (r!=counter) std::cout << "Counter failure\n"; )
    return r;
  }
private:
  std::set<range> state;
  using iterator = std::set<range>::iterator;
  iterator get_interval(int x) const;
  std::size_t counter = 0;
};

id_set::iterator id_set::get_interval(int x) const {
  auto start = state.lower_bound( {x,0} );
  if (start != state.end())
      if ((*start)(x))
        return start;

  if (start == state.begin() )
    return state.end();

  auto prev = std::prev(start);
  if ((*prev)(x))
    return prev;

  return state.end();
}
bool id_set::taken(int x) const {
  return get_interval(x) != state.end();
}

int id_set::find_unused() const {
  auto it = state.begin();
  if (it == state.end()) return 0;
  auto r = it->base + it->length; // we ensure these intervals are never adjacent; thus the element after the first interval is untaken
  ERROR( if (taken(r)) std::cout << "find_unused failed\n"; )
  return r;
}

void id_set::take(int x) {
  if (taken(x)) return; // nothing to do
  ++counter;

  auto merge_with_next = [&](auto next) {
    VERBOSE(std::cout << "merge_with_next\n"; )
    auto tmp = *next;
    tmp.base = x;
    ++tmp.length;
    state.erase(next);
    state.insert(tmp);
    ERROR( if (!taken(x)) std::cout << "merge_with_next failed\n"; )
  };
  auto merge_with_prev = [&](auto prev) {
    VERBOSE(std::cout << "merge_with_prev\n"; )
    auto tmp = *prev;
    ++tmp.length;
    state.erase(prev);
    state.insert(tmp);
    ERROR( if (!taken(x)) std::cout << "merge_with_prev failed\n"; )
  };
  auto merge_prev_and_next = [&](auto prev, auto next) {
    VERBOSE(std::cout << "merge_prev_and_next\n"; )
    auto tmp = *prev;
    tmp.length += next->length + 1;
    state.erase(prev);
    state.erase(next);
    state.insert(tmp);
    ERROR( if (!taken(x)) std::cout << "merge_prev_and_next failed\n"; )
  };
  auto insert_in_gap = [&] {
    VERBOSE(std::cout << "insert_in_gap\n"; )
    state.insert( {x, 1} );
    ERROR( if (!taken(x)) std::cout << "insert_in_gap failed\n"; )
  };

  if (state.empty())
    return insert_in_gap();

  auto start = state.lower_bound( {x,0} );

  // this is before the beginning, and there is a gap:
  if (start == state.begin() && start->base > x+1)
    return insert_in_gap();
  if (start == state.begin()) {
    // no gap and just before start
    return merge_with_next(state.begin());
  }
  // this is valid, because we are not begin:
  auto prev = std::prev(start);
  if (start == state.end() || start->base != x + 1) {
    if (prev->base + prev->length == x)
      return merge_with_prev(prev);

    return insert_in_gap();
  }     
  // both prev and start are valid iterators
  // start->base == x+1
  if (prev->base + prev->length == x)
    return merge_prev_and_next(prev, start);

  return merge_with_next(start);
}
// return an id:
void id_set::recycle(int x) {
  auto it = get_interval(x);
  if (it == state.end()) return; // nothing to do
  --counter;

  // create two intervals, one before and one after:      
  auto lhs = *it;
  lhs.length = x-lhs.base;
  auto rhs = *it;
  rhs.base = x+1;
  rhs.length -= lhs.length+1;
  // remove this interval:
  state.erase(it);
  // insert either non-zero length interval:
  if (lhs.length > 0)
    state.insert(lhs);
  if (rhs.length > 0)
    state.insert(rhs);
  ERROR( if (taken(x)) std::cout << "recycle failed\n"; )
}

there are probably typos above. But the core idea is that take and recycle are both O(lgn) operations, as is find_unused. Thus take_unused is also O(lgn).

Live example

Upvotes: 2

JATothrim

Reputation: 852

I already accepted @Drew Dormann's answer. However I modified the algo such that I can to allow inserting the values into std::list<int> keeping the list sorted. I also made it a generic template function:

#include <algorithm>
#include <list>
#include <iterator>
#include <iostream>
// Make smallest sequential val that is not in c.
// returns [val, *c.rbegin() + 1] and suitable iterator to insert val into c.
template<typename Cont>
void make_minimal_id(const Cont & c, typename Cont::value_type & val,
                     typename Cont::const_iterator & x)
{
    if( c.empty()) {
        x = c.begin();
    } else if ( *c.begin() > val ) {
        x = c.begin();
        val = *x - 1;
    } else {
        // "one algo that gets lost in the back of the drawer and forgotten."
        x = std::adjacent_find(c.begin(), c.end(),
            [](typename Cont::value_type a, typename Cont::value_type b)
            { return a + 1 != b; } );
        if ( x == c.end() ) {
            val = *c.rbegin() + 1;
        } else {
            val = *x++ + 1;
        }
    }
}
// pretty print container 
template<typename C>
void print_cont(const C & c) {
    std::cout << std::accumulate(std::next(c.begin()), c.end(),
        std::to_string(c.front()), // start with first element
        [](std::string a, int b) {
            return a + ", " + std::to_string(b); }) << std::endl;
}

int main() {
    std::list<unsigned int> ids = { 3, 4, 5, 6, 9 };
    print_cont(ids);
    for(int i = 0; i < 6; ++i) {
        unsigned int id = 0;
        std::list<unsigned int>::iterator x;
        make_minimal_id(ids, id, x);
        ids.insert(x, id);
        print_cont(ids);
    }
    // prints:
    //{3, 4, 5, 6, 9}
    //{2, 3, 4, 5, 6, 9}
    //{1, 2, 3, 4, 5, 6, 9}
    //{0, 1, 2, 3, 4, 5, 6, 9}
    //{0, 1, 2, 3, 4, 5, 6, 7, 9}
    //{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
    //{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
    return 0;
}

make_minimal_id() now works for any sorted container and value. :-)

Upvotes: 0

Drew Dormann

Reputation: 63775

Since a std::set's elements are sorted, you can use std::adjacent_find.

std::adjacent_find(set.begin(), set.end(), [](int a, int b){ return a+1 != b; } );

This will return an iterator to the first element, a, which is not followed by the value a+1. Or set.end() if there is no such value.

Sample usage:

std::set<int> ids { -2, -1, 0, 1, 2, 4, 5, 6 };

// This code assumes a non-empty set, since the range in the question requires same
if ( *ids.begin() > 0 )
{
    // Special case for [0
    std::cout << 0;
}
else
{
    auto res = std::adjacent_find(ids.begin(),
                                  ids.end(),
                                  [](int a, int b){ return a+1 != b; } );
    if ( res == ids.end() )
    {
        // Special case for *ids.rbegin() + 1]
        std::cout << *ids.rbegin() + 1;
    }
    else
    {
        // Print the value that should have followed this one.
        std::cout << *res + 1;
    }
}

Output:

Upvotes: 7

Ap31

Reputation: 3324

If you insist on using std::set to store a set of integers, then unfortunately linear traversal (suggested in other answers) is the best option.

With a small modification to your search tree you can achieve O(log n) search for the smallest missing integer while retaining all the other properties - you just need to store the amount of entries smaller than the key for each tree node. Not sure whether you can implement it without rewriting the whole std::set though.
UPD: turns out you can - check the answer by @Yakk - Adam Nevraumont

In fact, by caching the smallest value after each operation on your container, you can get it to "find" your number in O(1), which is asymptotically unbeatable. Of course, it will not gain you any actual performance improvement.

Finally, if there are any additional limitations placed upon your integers, you can sometimes come up with some clever trick (usually happens in coding tasks from the training websites) - e.g. if you know that there is exactly one number missing in a range, you can calculate it arithmetically in O(1).

Upvotes: 0

Jarod42

Reputation: 217408

Your version is O(n log(n)).
You can have it in linear time (O(n)):

int h = 0;
for (const int e : ids) {
    if (e != h) {
        return h;
    }
    ++h;
}
return h;

If ids can contains negative numbers, then change the loop to:

int h = 0;
for (auto it = ids.lower_bound(0); it != ids.end(); ++it) {
    if (*it != h) {
        return h;
    }
    ++h;
}
return h;

With sorted vector, you can even reduce complexity to O(log(n)).

Upvotes: 0

Stephan Lechner

Reputation: 35154

The elements in the set are sorted. Hence, when you iterate the elements, you just have to detect the first "leak" (e.g. if the value of the set is larger than a linear increasing integral value). You can define where to start, e.g. you could write h=1 if you want to ignore value 0, and also the corner case of a set of continuous numbers is covered (result is then max+1):

int main() {
    std::set<int> mySet = {5,4,6,2,1,0};
    int h=0;
    for(auto val : mySet) {
        if (val != h) {
            break;
        }
        h++;
    }

    cout << "not in mySet:" << h << endl;

    return 0;
}

Upvotes: 0

Bhupesh Raj

Reputation: 31

std::set<int> ids;
int h = 0;
for( auto id : ids ) {
    if( id != h )
        break;
     h++;
}

Upvotes: 2

Find first value not in std::set&lt;int&gt; min-max

Answers (7)

Related Questions

Find first value not in std::set<int> min-max