Reputation: 2652
I need some help with choosing an efficient algorithm to put elements from a vector into presorted buckets - or ideally output iterator ranges (as they are efficient I think). The example below is totally contrived, but the idea is that a an element's key is used to determine the output bucket. I'm not asking how to sort in order as that is a very simple matter of simply calling (which works and reorders the elements according to the key)
std::sort(testVec.begin(), testVec.end(), comparator);
I put a live example on coliru, where it is very easy to modify and fix (well not that easy or I wouldn't be asking the question). I could also go through the elements in this sorted list and while the key value is the same, append it into a new bucket, but I am looking for something more STL like in nature, and right now the above smells like a bit of a last resort hack, also the final solution needs to be efficient as the testVec is potentially large and the objects are also big in size. I would prefer not to modify the testvec - so it should be immutable.
Ideally I am looking for some sort of construct that spits out range iterators or something equally efficient. The actual objects are large so passing references or moving them is really the only option - my actual objects (the equivalent to MyStr) are movable. Something sort of foreach key, apply key predicate or something which I cannot figure out is what I am looking for. I hard coded the 3 buckets below just to show what I need to be able to attain - this is totally a hack.
Thanks in advance for any help with this problem
#include <string>
#include <iostream>
#include <sstream>
#include <iterator>
#include <vector>
#include <algorithm>
struct MyStr
{
int key;
std::string strval;
MyStr(int key, const std::string& rStrVal)
: key(key)
, strval(rStrVal)
{}
// let stream operators be friend functions instead of members!
inline friend std::ostream& operator << (std::ostream& os, const MyStr& val) {
os << "key[" << val.key << "], strval['" << val.strval << "']";
return os;
}
bool operator < (const MyStr& str) const {
return (key > str.key);
}
};
int main()
{
std::vector <MyStr> testVec = {
MyStr(4, "key 4"),
MyStr(3, "key 3"),
MyStr(3, "key 3"),
MyStr(2, "key 2"),
MyStr(2, "key 2"),
MyStr(2, "key 2")
};
//auto comparator = [](const MyStr& lhs, const MyStr& rhs) {
// return lhs.key < rhs.key;
//};
std::vector <MyStr> foursBucket;
std::vector <MyStr> threesBucket;
std::vector <MyStr> twosBucket;
auto ostriter = std::ostream_iterator<MyStr>(std::cout, ",");
std::for_each(testVec.begin(), testVec.end(),
[&](const MyStr& next){
switch (next.key) {
case 4:
foursBucket.push_back(next);
break;
case 3:
threesBucket.push_back(next);
break;
case 2:
twosBucket.push_back(next);
break;
}
});
std::cout << "Elements with Key Value 2" << std::endl;
std::copy(twosBucket.begin(), twosBucket.end(), ostriter);
std::cout << std::endl;
std::cout << "Elements with Key Value 3" << std::endl;
std::copy(threesBucket.begin(), threesBucket.end(), ostriter);
std::cout << std::endl;
std::cout << "Elements with Key Value 4" << std::endl;
std::copy(foursBucket.begin(), foursBucket.end(), ostriter);
std::cout << std::endl;
}
produces the following output
Elements with Key Value 2
key[2], strval['key 2'],key[2], strval['key 2'],key[2], strval['key 2'],
Elements with Key Value 3
key[3], strval['key 3'],key[3], strval['key 3'],
Elements with Key Value 4
key[4], strval['key 4'],
As you can see the structure is very simple and I've shown how I can currently sort the objects using a predicate but I do not know which of the plethora of algorithms to choose from to efficiently iterate over
Upvotes: 0
Views: 173
Reputation: 109289
You're looking for an unordered_multimap
. It is an unordered associative container that'll place the key-value pairs into buckets depending on the hash value of the key (int
in the following example).
std::unordered_multimap<int, std::string>
mymap{{4, "key 4"},
{3, "key 3"},
{3, "key 3"},
{2, "key 2"},
{2, "key 2"},
{2, "key 2"},
};
for(auto const& kv : mymap) {
std::cout << "key: " << kv.first << " value: " << kv.second << '\n';
}
Output:
key: 2 value: key 2
key: 2 value: key 2
key: 2 value: key 2
key: 3 value: key 3
key: 3 value: key 3
key: 4 value: key 4
In the comment below you clarified that you receive a vector<MyStr>
input, and the container type cannot be changed. In that case, use std::equal_range
to find all the elements containing a particular key.
// comparator for equal_range
struct comp
{
bool operator()(int key, MyStr const& m) const { return m.key < key; }
bool operator()(MyStr const& m, int key) const { return key < m.key; }
};
// sort the vevctor
std::sort(testVec.begin(), testVec.end());
// search for all elements with key=2
auto range = std::equal_range(testVec.begin(), testVec.end(), 2, comp());
for(auto it = range.first; it != range.second; ++it) {
std::cout << "key: " << it->key << " value: " << it->strval << '\n';
}
Output:
key: 2 value: key 2
key: 2 value: key 2
key: 2 value: key 2
To iterate over each unique key, the easiest way is to use std::unique_copy
to create a new container that holds only elements that have unique keys. Then iterate over this container and use equal_range
on each key.
bool operator==(MyStr const& m1, MyStr const& m2) { return m1.key == m2.key; }
// sort the vevctor
std::sort(testVec.begin(), testVec.end());
std::vector<MyStr> unique_keys;
std::unique_copy(testVec.begin(), testVec.end(), std::back_inserter(unique_keys));
for(auto const& u : unique_keys) {
std::cout << "Searching for key: " << u.key << '\n';
auto range = std::equal_range(testVec.begin(), testVec.end(), u.key, comp());
for(auto it = range.first; it != range.second; ++it) {
std::cout << "key: " << it->key << " value: " << it->strval << '\n';
}
}
If the elements are expensive to copy, and you'd rather avoid having to create a new container holding unique elements, you could create your own output iterator that mimics std::back_insert_iterator
. Its operator=
implementation will take a MyStr const&
argument, but push_back
only the key from the argument into the unique key container, which would be vector<int>
in that case.
Another approach (that I'm not certain will work) that you can use to avoid modifying the input range, and avoid copying the elements to a new range, is to create vector<MyStr *>
, where each element points to the corresponding element in the original range. Then repeat all the steps above, except, instead of passing vector::iterator
s to the algorithms, use boost::indirect_iterator
. This iterator will apply an extra level of dereferencing to the pointers in your container, and the algorithms should then work as if they were operating on vector<MyStr>
instead.
Upvotes: 2