Reputation: 27446

Better way to count things?

In one of my programs for school, I use the following function to count the frequency of identifiers in a string, separated by newlines and #:

Input:

dog
cat
mouse
#
rabbit
snake
#

Function:

//assume I have the proper includes, and am using namespace std
vector< pair<string,int> > getFreqcounts(string input) {
    vector<string> items = splitString(input,"\n");
    map<string,int> counts;

    for (int i=0; i<items.size(); i++) {
        if (items[i] == "#") continue;
        counts[items[i]] = 0;
    }
    for (int i=0; i<items.size(); i++) {
        if (items[i] == "#") continue;
        counts[items[i]]++;
    }

    return vector< pair<string,int> > (counts.begin(),counts.end());
}

I would like to at the very least

remove the double for loop
find a better way to get a vector< pair<string,int> >

Any ideas?

BTW, this is NOT homework. The real homework will use this function, but this is purely out of my own curiosity and desire to have "better" code.

Upvotes: 0

Answers (4)

Jerry Coffin

Reputation: 490603

You can get rid of the first for loop by simply deleting it. It accomplishes nothing useful. When/if the subscript into the map creates a new item, that item will have the chosen key, and your associated int will be initialized to zero automatically.

Personally, I'd probably do things a bit differently, using a stringstream instead of your SplitString(). I'm hesitant about posting code, but I guess I'll trust you...

typedef vector<pair<string, int> > count_vec;

count_vec GetFreqCounts(string const &input) { 
    istringstream in(input);
    string line;
    map<string, int> counts;

    while (getline(in, line))
        if (line != "#")
            ++counts[line];
    return count_vec(counts.begin(), counts.end());
}

Edit: I honestly didn't pay a whole lot of attention to efficiency as I was writing this, but I think Steve Jessop's comment on it is pretty accurate. As long as the input is small, it won't make any real difference. If the input is really big, the fact that this only uses an extra copy of one word at a time could save enough memory to be meaningful.

The solution Steve gave in his reply looks pretty nice too though. Since it also processes words as they're produced, I'd expect it to have characteristics similar to the code above. If you can break the string into words faster than stringstream does, it'll undoubtedly be faster. Given the number of virtual functions that get in the way with iostreams, there's a pretty good chance of that -- but unless you're dealing with a lot of text there's not much chance of it making a significant difference. Of course, exactly what qualifies as significant is open to question. To put it in perspective, I ran some similar code across a word list I had handy. Using code pretty close to what's above, it processes text at a little over 10 megabytes a second.

Upvotes: 2

rmn

Reputation: 2426

If you want to keep the current logic but contain all of it in a single loop, you can use find(), like so:

map<string,int> counts;
loop on curr ..
map<string,int>::iterator it = counts.find(curr);
if (it != counts.end())
  ++it->second;
else
  counts[curr] = 1;

But that's not what I would do. There's no need to explicitly initialize the map entry in the first time as it will default to zero anyway, so you can use:

map<string,int> counts;
loop on curr ..
  if (..
    ++map[curr];

Another thing i would do is use std::getline() with '\n' and '#' being the delimiters.

Upvotes: 0

Steve Jessop

Reputation: 279385

In a comment to Jerry's answer I mentioned using a functor. Here's the sort of thing I mean (untested code):

struct StringCounter {
    std::map<std::string, int> counts;
    void operator()(const std::string &s) {
        ++counts[s];
    }
};

template <typename Output>
void splitString(const string &input, const string &separator, Output &out) {
    // do whatever you currently do to get each string, call it "s"...
    out(s);
    // lather, rinse, repeat
}

vector< pair<string,int> > getFreqcounts(const string &input) {
    StringCounter sc;
    splitString(input,"\n",sc);
    return vector< pair<string,int> > (sc.counts.begin(),sc.counts.end());
}

Upvotes: 1

fbrereto

Reputation: 35935

My understanding of std::map is such that the entire first loop can simply be eliminated. The first time you try to access a node that does not exist the map will default-create it for you, setting the initial count to zero (the default-ctor behavior of a builtin type.) That should be all the changes you need to make to your code, and the behavior should be the same.

Update: On a side note in the code you have provided counts will be sorted according to operator< defined for std::string (the key type for your map), which will sort the map nodes lexicographically. There's no need to pump the results through a vector and sort the vector - the map is handling this for you automatically.

Upvotes: 5

Better way to count things?

Answers (4)

Related Questions