Hal T
Hal T

Reputation: 547

Fastest/most minimal way to check if string contains multiple words

I'm using c++11, and can use regex stuff. I'm wondering what's the fastest way to check if a string contains multiple words, and if so how many. Words in this case are defined as groups of characters separated by whitespace.

I have a few options:

  1. Split string by whitespace, count length of split
  2. Use some sort of regex
  3. Count whitespace characters

Option 1 is the easiest way, but accounting for multiple whitespace characters makes splitting a bit more complicated. 2 is probably slower and I'm not sure how I'd get a count out of it. 3 is the fastest I can think of, but there are probably many corner cases to consider. I would like my solution to be as efficient as possible and include as little extra libraries as possible. It's a solvable problem for me, but I need more insight as to what the best solution is.

I'm leaning towards the first, but then what functions would be best? istringstream plus an iterator, stringstream, some char* magic? I'm not certain what the fastest method would be.

Upvotes: 0

Views: 1363

Answers (2)

CroCo
CroCo

Reputation: 5741

In this scenario, you could utilize an associative container. C++ has a variety of choices. For example, you can utilize std::map. In the following code, you can count how many multiple words occur in text.

#include <iostream>
#include <string>
#include <map>
#include <algorithm>


int main()
{
    std::map<std::string,int> strCount;
    std::string str("AA BB ABC AA GE AAf FF JJ BB CC "); 
    std::string temp;

    // Split String Based on Whitespace (i.e. you need to modify it to suit the text format you have. )
    for ( int i(0); i < str.size(); ++i ){
        temp += str[i];
        if ( str[i] == ' ' ){
            temp.pop_back();
            ++strCount[temp]; // <-- if element new, insert it in map and associate new counter, otherwise increment counter of element. 
            temp.clear();
       }
    }


    std::map<std::string,int>::const_iterator iter;   
    for( iter = strCount.begin(); iter != strCount.end(); iter++ ) {
        std::cout << "#: " << iter->second << " string: " << iter->first << std::endl;
    }
    return 0;
}

The output of the preceding code is

#: 2 string: AA
#: 1 string: AAf
#: 1 string: ABC
#: 2 string: BB
#: 1 string: CC
#: 1 string: FF
#: 1 string: GE
#: 1 string: JJ

Upvotes: 0

Paul Bentley
Paul Bentley

Reputation: 343

I would iterate through the string, counting the words and iterating over any consecutive whitespace.

  • Increase word count whenever moving from whitespace to non-whitespace.
  • Increase word count if string starts with non-whitespace

    int countWords(string& toCount, const string& whitespace){
        enum countStatus {
            startOfString,
            initialWhitespace,
            movePastWord,
            movePastWhitespace
        } status=startOfString;    
    
        int wordCount=0;
    
        for(char& c : toCount) {
            bool characterIsWhitespace=false;
    
            if (whitespace.find(c)!=string::npos) {
                characterIsWhitespace=true;
            }
    
            switch(status) {
                case startOfString:
                    if (characterIsWhitespace) {
                        status=initialWhitespace;
                    } else {
                        status=movePastWord;
                        wordCount++;
                    }
                break;
    
                case initialWhitespace:
                    if (!characterIsWhitespace) {
                        wordCount++;
                        status=movePastWord;
                    }
                break;
    
                case movePastWord:
                    if (characterIsWhitespace) {
                        status=movePastWhitespace;
                    }
                break;
    
                case movePastWhitespace:
                    if (!characterIsWhitespace) {
                        wordCount++;
                        status=movePastWord;
                    }
            }
        }
    
        return wordCount;
    }
    

Upvotes: 1

Related Questions