Reputation: 5302
I have to implement a function that prints on the console every string in lexicographically order that has, as first letter, char c
, using only stl algorithms.
Here is what i thought:
void f(const std::vector<std::string>& vs, const char c)
{
std::vector<std::string> tmp = vs;
std::sort(tmp.begin(), tmp.end());
std::ostream_iterator<std::string> out(std::cout, "\n");
std::copy_if(tmp.begin(), tmp.end(), out, *predicate*);
}
As predicate I thought:
//*(tmp.begin()->begin()) == c);
But it doesn't work.
Upvotes: 1
Views: 343
Reputation: 114481
I think it's a waste to sort all the elements and then printing only those starting with c
. What about instead sorting only those?
struct first_char_is {
char x;
first_char_is(char x) : x(x) {}
bool operator()(const std::string& s) {
return s.size() > 0 && s[0] == x;
}
};
void f(const std::vector<std::string>& vs, const char c)
{
std::vector<std::string> tmp;
std::copy_if(vs.begin(), vs.end(), std::back_inserter(tmp),
first_char_is(c));
std::sort(tmp.begin(), tmp.end());
std::ostream_iterator<std::string> out(std::cout, "\n");
std::copy(tmp.begin(), tmp.end(), out);
}
In C++ however strings are mutable and COW string implementations have their own problems. This means that when you duplicate a vector of strings all the string data gets also duplicated. To save memory a different approach would be to just keep and sort the indexes to the original array, but I'm not sure if this fits the "stl-only" artificial requirement (whatever that could mean).
struct IndirectComp {
const std::vector<std::string>& vs;
IndirectComp(const std::vector<std::string>& vs) : vs(vs) {}
const bool operator()(int a, int b) {
return vs[a] <= vs[b];
}
};
void f(const std::vector<std::string>& vs, const char c)
{
std::vector<int> ix;
for (int i=0,n=vs.size(); i<n; i++) {
if (vs[i].size() && vs[i][0] == c) {
ix.push_back(i);
}
}
std::sort(ix.begin(), ix.end(), IndirectComp(vs));
for (int i=0,n=ix.size(); i<n; i++) {
std::cout << vs[ix[i]] << "\n";
}
}
Upvotes: 4
Reputation: 16824
Like Benjamin Lindley, I think the accepted answer is sub-optimal, and this might be a better approach (untested, but you get the idea):
void f(std::vector<std::string> vs, const char c)
{
std::vector<std::string> result;
std::copy_if(vs.begin(), vs.end(), std::back_inserter(result),
[c](const std::string& s) { return !s.empty() && s.front() == c; });
std::sort(result.begin(), result.end());
std::copy(result.begin(), result.end(), std::ostream_iterator(std::cout, "\n"));
}
If we assume the input vector has N
entries of which K
begin with the letter c
, then this performs an O(N)
search/copy followed by an O(K.logK)
(average) sort, and then an O(K)
"copy" to the output stream. The approach in Zeta's answer has an O(N.logN)
sort first, which will dominate if K << N
(as we might expect for regular text).
EDIT: As Jerry Coffin's answer points out, if messing with the input vector is acceptable (it's a const reference in the original question), then you can get away without a temporary copy by using std::partition
-- kudos to him for thinking of this.
Upvotes: 1
Reputation: 490148
The answers you've gotten are simple and neat looking, but could be quite inefficient if you have a lot of data that doesn't fit the filter (starting with 'c' in this case).
I see two basic problems. First, they sort all the data, whether it fits the filter or not. That would be quite inefficient in itself. Second, they then use copy_if
to do a filtered copy of the data--but copy_if does nothing to take advantage of the sorting. It does a linear search, so it looks at all all the input data, including a lot that the right algorithm would already know wasn't worth considering (e.g., once it gets to something that starts with 'd', it might as well stop, because no more data is worth considering).
Alternatively, they do the filtering first, but do so by copying all the relevant data to a newly created vector, then sorting that copy of the data. This might be reasonably efficient with respect to speed, but could use quite a bit of extra memory.
I think it's better to filter first but without unnecessary copying, then sort only the data that fits the filter, and finally copy the sorted data to the output. In this case, we can use std::partition
to filter the data efficiently.
auto end = std::partition(in.begin(), in.end(),
[](std::string const &s) { return s[0] == 'c';});
std::sort(in.begin(), end);
std::copy(in.begin(), end, std::ostream_iterator<std::string>(std::cout, "\n"));
Short of an exceptionally horrible implementation of std::partition
, filtering then sorting should always be at least as fast as sorting then filtering -- and if a significant amount of the original input gets filtered out, filtering first is likely to be significantly faster. It can clearly save quite a bit of memory compared to creating a filtered copy, then sorting the copy. In most cases, it'll be quite a bit faster as well. Partition only has to swap strings, rather than copy them, which is typically quite a bit faster (the primary exception being short strings when std::string
uses the short string optimization).
Upvotes: 5
Reputation: 2767
There are 3 soloutions:
In all cases N is the size of your vector and n is the number of elements satisfying your predicate. In general n<=N and depending on your dataset (a normal english text) it couldbe n << N.
Upvotes: 0
Reputation: 105886
The easiest way is to use a lambda as predicate:
void f(std::vector<std::string> vs, const char c)
{
std::sort(vs.begin(), vs.end());
std::ostream_iterator<std::string> out(std::cout, "\n");
std::copy_if(vs.begin(), vs.end(), out,
[c](const std::string & s){return !s.empty() && s.front() == c;}
);
}
It's not possible to write a predicate using only <algorithm>
. However, the lambda can be reconstructed with std::bind
, std::equal_to
, std::string::front
, std::logical_and
and std::string::empty
from <functional>
. However, that would make your code very complicated.
Since you're already using C++11, I recommend you to use lambdas.
Upvotes: 1