Francesco Bonizzi
Francesco Bonizzi

Reputation: 5302

How to implement this function using only stl algotithms

I have to implement a function that prints on the console every string in lexicographically order that has, as first letter, char c, using only stl algorithms.

Here is what i thought:

void f(const std::vector<std::string>& vs, const char c)
{   
    std::vector<std::string> tmp = vs;

    std::sort(tmp.begin(), tmp.end());
    std::ostream_iterator<std::string> out(std::cout, "\n");
    std::copy_if(tmp.begin(), tmp.end(), out, *predicate*); 

}

As predicate I thought:

//*(tmp.begin()->begin()) == c);

But it doesn't work.

Upvotes: 1

Views: 343

Answers (5)

6502
6502

Reputation: 114481

I think it's a waste to sort all the elements and then printing only those starting with c. What about instead sorting only those?

struct first_char_is {
    char x;
    first_char_is(char x) : x(x) {}
    bool operator()(const std::string& s) {
        return s.size() > 0 && s[0] == x;
    }
};

void f(const std::vector<std::string>& vs, const char c)
{   
    std::vector<std::string> tmp;
    std::copy_if(vs.begin(), vs.end(), std::back_inserter(tmp),
                 first_char_is(c));
    std::sort(tmp.begin(), tmp.end());
    std::ostream_iterator<std::string> out(std::cout, "\n");
    std::copy(tmp.begin(), tmp.end(), out);
}

In C++ however strings are mutable and COW string implementations have their own problems. This means that when you duplicate a vector of strings all the string data gets also duplicated. To save memory a different approach would be to just keep and sort the indexes to the original array, but I'm not sure if this fits the "stl-only" artificial requirement (whatever that could mean).

struct IndirectComp {
    const std::vector<std::string>& vs;
    IndirectComp(const std::vector<std::string>& vs) : vs(vs) {}
    const bool operator()(int a, int b) {
        return vs[a] <= vs[b];
    }
};

void f(const std::vector<std::string>& vs, const char c)
{
    std::vector<int> ix;
    for (int i=0,n=vs.size(); i<n; i++) {
        if (vs[i].size() && vs[i][0] == c) {
            ix.push_back(i);
        }
    }
    std::sort(ix.begin(), ix.end(), IndirectComp(vs));
    for (int i=0,n=ix.size(); i<n; i++) {
        std::cout << vs[ix[i]] << "\n";
    }
}

Upvotes: 4

Tristan Brindle
Tristan Brindle

Reputation: 16824

Like Benjamin Lindley, I think the accepted answer is sub-optimal, and this might be a better approach (untested, but you get the idea):

void f(std::vector<std::string> vs, const char c)
{
    std::vector<std::string> result;
    std::copy_if(vs.begin(), vs.end(), std::back_inserter(result),
                 [c](const std::string& s) { return !s.empty() && s.front() == c; });
    std::sort(result.begin(), result.end());
    std::copy(result.begin(), result.end(), std::ostream_iterator(std::cout, "\n"));
}

If we assume the input vector has N entries of which K begin with the letter c, then this performs an O(N) search/copy followed by an O(K.logK) (average) sort, and then an O(K) "copy" to the output stream. The approach in Zeta's answer has an O(N.logN) sort first, which will dominate if K << N (as we might expect for regular text).

EDIT: As Jerry Coffin's answer points out, if messing with the input vector is acceptable (it's a const reference in the original question), then you can get away without a temporary copy by using std::partition -- kudos to him for thinking of this.

Upvotes: 1

Jerry Coffin
Jerry Coffin

Reputation: 490148

The answers you've gotten are simple and neat looking, but could be quite inefficient if you have a lot of data that doesn't fit the filter (starting with 'c' in this case).

I see two basic problems. First, they sort all the data, whether it fits the filter or not. That would be quite inefficient in itself. Second, they then use copy_if to do a filtered copy of the data--but copy_if does nothing to take advantage of the sorting. It does a linear search, so it looks at all all the input data, including a lot that the right algorithm would already know wasn't worth considering (e.g., once it gets to something that starts with 'd', it might as well stop, because no more data is worth considering).

Alternatively, they do the filtering first, but do so by copying all the relevant data to a newly created vector, then sorting that copy of the data. This might be reasonably efficient with respect to speed, but could use quite a bit of extra memory.

I think it's better to filter first but without unnecessary copying, then sort only the data that fits the filter, and finally copy the sorted data to the output. In this case, we can use std::partition to filter the data efficiently.

auto end = std::partition(in.begin(), in.end(), 
    [](std::string const &s) { return s[0] == 'c';});

std::sort(in.begin(), end);
std::copy(in.begin(), end, std::ostream_iterator<std::string>(std::cout, "\n"));

Short of an exceptionally horrible implementation of std::partition, filtering then sorting should always be at least as fast as sorting then filtering -- and if a significant amount of the original input gets filtered out, filtering first is likely to be significantly faster. It can clearly save quite a bit of memory compared to creating a filtered copy, then sorting the copy. In most cases, it'll be quite a bit faster as well. Partition only has to swap strings, rather than copy them, which is typically quite a bit faster (the primary exception being short strings when std::string uses the short string optimization).

Upvotes: 5

Jan Herrmann
Jan Herrmann

Reputation: 2767

There are 3 soloutions:

  1. sort then copy the elements which satisfy your requirement, complexite N*log(N) + N
  2. copy the elements which satisfy your requirements then sort, complexity N+n*log(n)
  3. sort and search for the range with std::lowerbound, complexity N*log(N) + 2*log(N) + n

In all cases N is the size of your vector and n is the number of elements satisfying your predicate. In general n<=N and depending on your dataset (a normal english text) it couldbe n << N.

Upvotes: 0

Zeta
Zeta

Reputation: 105886

The easiest way is to use a lambda as predicate:

void f(std::vector<std::string> vs, const char c)
{   
    std::sort(vs.begin(), vs.end());
    std::ostream_iterator<std::string> out(std::cout, "\n");
    std::copy_if(vs.begin(), vs.end(), out, 
       [c](const std::string & s){return !s.empty() && s.front() == c;}
    ); 
}

It's not possible to write a predicate using only <algorithm>. However, the lambda can be reconstructed with std::bind, std::equal_to, std::string::front, std::logical_and and std::string::empty from <functional>. However, that would make your code very complicated.

Since you're already using C++11, I recommend you to use lambdas.

Upvotes: 1

Related Questions