jai
jai

Reputation: 1

C++ Find Word in String without Regex

I'm trying to find a certain word in a string, but find that word alone. For example, if I had a word bank:

789540132143
93
3
5434

I only want a match to be found for the value 3, as the other values do not match exactly. I used the normal string::find function, but that found matches for all four values in the word bank because they all contain 3.

There is no whitespace surrounding the values, and I am not allowed to use Regex. I'm looking for the fastest implementation of completing this task.

Upvotes: 0

Views: 300

Answers (2)

Azad
Azad

Reputation: 1120

If you want to count the words you should use a string to int map. Read a word from your file using >> into a string then increment the map accordingly

string word;
map<string,int> count;
ifstream input("file.txt");
while (input.good()) {
    input >> word;
    count[word]++;
}

using >> has the benefit that you don't have to worry about whitespace.

Upvotes: 1

Christophe
Christophe

Reputation: 73366

All depends on the definition of words: is it a string speparated from others with a whitespace ? Or are other word separators (e.g. coma, dot, semicolon, colon, parenntheses...) relevant as well ?

How to parse for words without regex:

Here an accetable approach using find() and its variant find_first_of():

string myline;     // line to be parsed
string what="3";   // string to be found
string separator=" \t\n,;.:()[]";  // string separators
while (getline(cin, myline)) {
    size_t nxt=0;
    while ( (nxt=myline.find(what, nxt)) != string::npos) {  // search occurences of what
        if (nxt==0||separator.find(myline[nxt-1])!=string::npos) { // if at befgin of a word
            size_t nsep=myline.find_first_of(separator,nxt+1);   // check if goes to end of wordd
            if ((nsep==string::npos && myline.length()-nxt==what.length()) || nsep-nxt==what.length()) {
                cout << "Line: "<<myline<<endl;    // bingo !!  
                cout << "from pos "<<nxt<<" to " << nsep << endl; 
            }
        }
        nxt++;  // ready for next occurence
    }
}

And here the online demo.

The principle is to check if the occurences found correspond to a word, i.e. are at the begin of a string or begin of a word (i.e. the previous char is a separator) and that it goes until the next separator (or end of line).

How to solve your real problem:

You can have the fastest word search function: if ou use it for solving your problem of counting words, as you've explained in your comment, you'll waste a lot of efforts !

The best way to achieve this would certainly be to use a map<string, int> to store/updated a counter for each string encountered in the file.

You then just have to parse each line into words (you could use find_fisrst_of() as suggested above) and use the map:

 mymap[word]++; 

Upvotes: 0

Related Questions