Reputation: 547
I'm using c++11, and can use regex stuff. I'm wondering what's the fastest way to check if a string contains multiple words, and if so how many. Words in this case are defined as groups of characters separated by whitespace.
I have a few options:
Option 1 is the easiest way, but accounting for multiple whitespace characters makes splitting a bit more complicated. 2 is probably slower and I'm not sure how I'd get a count out of it. 3 is the fastest I can think of, but there are probably many corner cases to consider. I would like my solution to be as efficient as possible and include as little extra libraries as possible. It's a solvable problem for me, but I need more insight as to what the best solution is.
I'm leaning towards the first, but then what functions would be best? istringstream
plus an iterator, stringstream
, some char*
magic? I'm not certain what the fastest method would be.
Upvotes: 0
Views: 1363
Reputation: 5741
In this scenario, you could utilize an associative container. C++ has a variety of choices. For example, you can utilize std::map
. In the following code, you can count how many multiple words occur in text.
#include <iostream>
#include <string>
#include <map>
#include <algorithm>
int main()
{
std::map<std::string,int> strCount;
std::string str("AA BB ABC AA GE AAf FF JJ BB CC ");
std::string temp;
// Split String Based on Whitespace (i.e. you need to modify it to suit the text format you have. )
for ( int i(0); i < str.size(); ++i ){
temp += str[i];
if ( str[i] == ' ' ){
temp.pop_back();
++strCount[temp]; // <-- if element new, insert it in map and associate new counter, otherwise increment counter of element.
temp.clear();
}
}
std::map<std::string,int>::const_iterator iter;
for( iter = strCount.begin(); iter != strCount.end(); iter++ ) {
std::cout << "#: " << iter->second << " string: " << iter->first << std::endl;
}
return 0;
}
The output of the preceding code is
#: 2 string: AA
#: 1 string: AAf
#: 1 string: ABC
#: 2 string: BB
#: 1 string: CC
#: 1 string: FF
#: 1 string: GE
#: 1 string: JJ
Upvotes: 0
Reputation: 343
I would iterate through the string, counting the words and iterating over any consecutive whitespace.
Increase word count if string starts with non-whitespace
int countWords(string& toCount, const string& whitespace){
enum countStatus {
startOfString,
initialWhitespace,
movePastWord,
movePastWhitespace
} status=startOfString;
int wordCount=0;
for(char& c : toCount) {
bool characterIsWhitespace=false;
if (whitespace.find(c)!=string::npos) {
characterIsWhitespace=true;
}
switch(status) {
case startOfString:
if (characterIsWhitespace) {
status=initialWhitespace;
} else {
status=movePastWord;
wordCount++;
}
break;
case initialWhitespace:
if (!characterIsWhitespace) {
wordCount++;
status=movePastWord;
}
break;
case movePastWord:
if (characterIsWhitespace) {
status=movePastWhitespace;
}
break;
case movePastWhitespace:
if (!characterIsWhitespace) {
wordCount++;
status=movePastWord;
}
}
}
return wordCount;
}
Upvotes: 1