Harish
Harish

Reputation: 69

Finding words in a (weird) string in C++

What is technically wrong in this program? The expected result is 6 since that is the total number of words present in the string.

#include <iostream>
using namespace std; 

int main()
{
    string str = "  Let's   count     the      number     of    words  ";
    int word = 0;
    for (int i = 0; str[i] != '\0';)
    {
        if ((str[i] == 32 && str[i + 1] == 32) || (str[i] == 32 && str[i - 1] == 32))
        {
            ++i;
        }
        else if ((str[i] == 32 && str[i - 1] != 32) || (str[i] == 32 && str[i + 1] != 32))
        {
            word++;
        }
        ++i;
    }
    cout << "No. of words: " << word << endl;
    return 0;
}

My incorrect result:

No. of words: 0

Also, if I try changing the spaces in the string or even the string itself to a totally new set of spaced out words, say:

string str = "   Hello world   ";
string str = "Hello    world! How   are you?   ";

I still get incorrect results, but different from 0. I'm new to C++ programming and these kinds of strange behaviors are giving me nightmares. Is this common? What I can do to get this corrected?

If you could highlight or correct my program the way I'd written it, it would be much helpful and quick for me to understand the mistake instead of having to know some new commands at this point. Because, as I said, I'm a total beginner in C/C++.

Thanks for your time!

Upvotes: 0

Views: 199

Answers (2)

Useless
Useless

Reputation: 67723

I'm new to C++ programming and these kinds of strange behaviors are giving me nightmares. Is this common?

Yes, it's very common. You've written a load of logic piled up in a heap and you don't have the tools to understand how it behaves.

What I can do to get this corrected?

You can work on this from both directions:

  1. debug this to improve your understanding of how it operates:

    • identify in advance what you expect it to do for some short input, at each line
    • single-step through it in the debugger to see what it actually does
    • think about why it doesn't do what you expected

    Sometimes the problem is that your code doesn't implement your algorithm correctly, and sometimes the algorithm itself is broken, and often it's a bit of both. Working through both will give you some insight.

  2. write code that is easier to understand in the first place (and equivalently, write algorithms that are easy to reason about).

    This depends on you having some intuition about whether something is easy to reason about, which you develop from iterating step 1.

... instead of having to know some new commands at this point.

Well, you need to learn to use a debugger anyway, so now is as good a time to start as any.

We can certainly improve the existing code, although I'd prefer to fix the logic. In general I'd encourage you to abstract your existing if conditions out into little functions, but the problem is that they don't currently seem to make any sense.

So, how do we define a word?

Your code says it is at least one non-space character preceded or followed by a space. (Do definitely prefer ' ' to 32, by the way, and std::isspace is better than either.)

However your code's implied definition is problematic, because:

  • each word longer than one character has both a first and last character, and you'll count each of them
  • you can't check whether the first character is preceded by anything, without going out of bounds
  • the last character is followed by the null terminator, but you don't count that as whitespace

Let's just choose a different definition, that doesn't require reading str[i-1], and doesn't require the tricky traversal your current code gets wrong.

I claim that a word is a contiguous substring of non-whitespace characters, and words are separated by contiguous substrings of whitespace characters. So, instead of looking at each pair of consecutive characters, we can write pseudocode to work in those terms:

    for (current = str.begin(); current != str.end(); ) {
        // skip any leading whitespace
        current = find_next_non_whitespace(str, current);
        if (current != str.end()) {
            // we found a word
            ++words;
            current = find_next_whitespace(str, current);
        }
    }
            

NB. When I talked about abstracting your code out into little functions, I meant things like find_next_non_whitespace - they should be trivial to implement, easy to test, and have a name that tells you something.

When I said your existing conditions didn't seem to make sense, it's because replacing

if ((str[i] == 32 && str[i + 1] == 32) || (str[i] == 32 && str[i - 1] == 32))

with, say,

if (two_consecutive_spaces(str, i))

prompts more questions than it answers. Why have a special case for exactly two consecutive spaces? Is it different to just one space? What will actually happen if we have two words with a single space between them? Why do we advance by two characters in this case, but only one on the word branch?

The fact that the code can't easily be mapped back onto explicable logic is a bad sign - even if it worked (which we know it doesn't), we don't understand it well enough to ever change, extend or refactor it.

Upvotes: 2

alurab
alurab

Reputation: 42

I think you have some ways to do it. Take a look at this code. Very similar to yours:

string s = "  Let's   count     the      number     of    words  ";

int word = 0;

for (auto i = 0; s[i] != '\0'; i++) {
    if (i == 0) {
        if (s[i] != ' ') {
            ++word;
        }
        continue;
    }

    if (s[i - 1] == ' ' && s[i] != ' ') {
        ++word;
    }
}

cout << "No of Words: " << word << endl;

The idea is to iterate over the string reading character by character. So we do some logic:

  • If we are in the first string character and it's equals to ' ', go to the next loop iteration
  • If we are in the first string character and it's different from ' ', means we are starting a word, so counts it and jump to the next loop iteration.
  • If we reach the second if, means we are not at the first position, so trying to access i - 1 should be valid. Then we just check if the previous char is a blank space and the current one it's not. This means we are starting a new word. So counts it and jump to the next loop iteration.

Another and more simple way to do it is using stringstream:

string s = "  Let's   count     the      number     of    words  ";
stringstream ss(s);
string sub;
int word = 0;
while (ss >> sub) {
    ++word;
}
cout << "No of Words: " << word << endl;

This way you're basically extracting word by word from your string.

Upvotes: 0

Related Questions