mynameisdlo
mynameisdlo

Reputation: 589

How to search for an exact word?

I'm curious as to if there is a way to find an exact word by itself even if the search is consisted within a word. As you can see below, the output is stating that the word 'day' is found twice, but that's only because 'day' is also used to spelt 'today'. I would like the search to specifically look for the word 'day' and count that even though it's found in 'today.

Is this possible?

Note: The assignment would like for us to use string manipulators

//search for particular word - member function
std::cout << "Please indicate a word which you like to be found in the paragraph you entered: "; 
getline(std::cin, searchWord);

//pos determines the position in the array it's in if the word is found and goes until the end of string. 
size_t pos = 0;
int wordCount = 0;

//npos = not found OR -1.
while (( pos = userParagraph.find(searchWord, pos)) != std::string::npos) {
    ++pos;
    ++wordCount;
}

if (wordCount == 0) {
    std::cout << "The word you entered, '" << searchWord << "', was not found." << std::endl << std::endl;
}
else {
    std::cout << searchWord << " was Found " << wordCount << " times." << std::endl << std::endl;
}'

search for word

Upvotes: 1

Views: 279

Answers (2)

cigien
cigien

Reputation: 60228

If you find a word, you can check if the adjacent characters are alphabets, using std::isalpha, and only count it if they are not alphabets.

while (( pos = userParagraph.find(searchWord, pos)) != std::string::npos) {
    if ((pos == 0 || !std::isalpha(userParagraph[pos - 1]))
        && (pos + searchWord.size() == userParagraph.size() 
           || !std::isalpha(userParagraph[pos + searchWord.size()]))
         ++wordCount;
    
    ++pos;  
}

and now the word won't be counted if it's part of another word.

Note that the additional checks are needed to make sure that you don't index into an invalid position of the string.

Upvotes: 1

Christophe
Christophe

Reputation: 73376

Yes, this is possible. But it requires you to decide what are word boundaries. For example, is '-' a word boundary like a space? Or would you consider it as a letter?

You may for example filter out non-words, by checking if the found string:

  • starts as a new word (i.e. either we are at the beginning of the string, or the character before is something else than a letter), and
  • ends as a word (i.e. either we reach the last char of the string, or the next char is not a latter).

It looks like this:

while (( pos = userParagraph.find(searchWord, pos)) != std::string::npos) {
    bool wstart = pos==0 || !isalpha(userParagraph[pos-1]);
    bool wend = pos+searchWord.size()==userParagraph.size() 
            || !isalpha(userParagraph[pos+searchWord.size()]);
    if (wstart && wend)
        ++wordCount;

    ++pos;
}

Online demo

Caution: this works with single char encoding only. With UTF8, it would fail for languages that uses letters that are not in the ascii alphabet (e.g. accentuated letters, like é, ñ, ä, ... would be misinterpreted as valid word separators)

Upvotes: 2

Related Questions