Reputation: 22438
I want to implement a data structure which represents the abstract data type "cover of a set". The elements of the set are represented by integer indices, and so are the subsets. Each element uint64_t e
is assigned to at least one but possibly multiple subsets uint64_t s
. This can be implemented by storing the subset indices in a std::vector
. The number of subsets to which any element will be assigned is usually much smaller than the total number of elements.
Performance (time and memory) is important, so which implementation would you recommend?
std::vector<std::vector<uint64_t>>
std::vector<std::unordered_set<uint64_t>>
std::vector<std::set<uint64_t>>
Frequent operations include:
Upvotes: 3
Views: 486
Reputation: 275310
The fastest (in terms of implementation time) would be a pair of data structures:
std::vector< std::unordered_set<X> > set_to_elements;
std::unordered_map< X, std::unordered_set<std::size_t> > element_to_sets;
with the two kept coherent. boost
multi index containers may be able to do this bi directional stuff a bit more efficiently.
assigning an element to a subset
set_to_elements[subset].insert(element);
element_to_sets[element].insert( subset );
removing an element from a subset (and possibly moving it to another)
set_to_elements[subset].erase(element);
element_to_sets[element].erase( subset );
checking whether an element is a member of a particular subset
return set_to_elements[subset].find(element) != set_to_elements[subset].end();
or return element_to_sets[element].find(subset) != element_to_sets[element].end(); getting all the subsets to which an element belongs
return element_to_sets[element];
efficient iteration over all elements of a particular subset would be nice, but I believe this conflicts with other goals
return set_to_elements[subset];
All operations are constant time and linear memory. Memory and time requirements are roughly double what a compact one that requires only one of the last two above.
Micro optimizations to cache the result of []
operations should be done in real code if it is actually performance sensitive. Storing iterators from one container to the other, to make operation #1 and #2 faster, are optional, and might make them a touch faster, but I wouldn't bother.
Upvotes: 2
Reputation: 70516
You could try to read the paper Segmented Iterators and Hierarchial Algorithms by Matt Austern. He discusses how to efficiently process hierarchial data structures of the form container<container<T>>
. One problem that needs to be solved is iterating as if you have a flat container<T>
. For this, the Standard Library algorithms need to be specialized for so-called segmented iterators.
A segmented iterator is a two-level data structure that -apart from performing top-level iteration- also contains a local iterator to go one level deeper. Because these local iterators themselves can also be segmented iterators, this allows arbitrarily nested data structures (such as trees and graphs).
A set cover for a discrete set can be constructed as std::vector<std::set<T>>
. To apply STL algorithms to such a container is either cumbersome or requires segmented iterators and hierarchial algoritms. Unfortunately, neither the Standard Library nor Boost actually implement this, so you have some work to do there.
Upvotes: 1