Vastor
Vastor

Reputation: 77

Splitting string into a vector<string> of words

From Accelerated C++(book), I found this code which is identical program, but the processed in program itself is different, and confused me on some part.

The code below, well, obviously it will output each word one-by-one(by loops) based on user input after the user included end-of-file, then, end the program.

int main()
{
    string s;
    while (cin >> s)
        cout << s << endl;
    return  0;
}

Unlike code above, this one will store each word in a vector, then use index i and j to detect the non-whitespace character, and the real question is, I don't understand how it happens with the vector.

What is whitespace in vector? An element?

At first, I thought the program will proceed through each character, because I thought the whitespace is character(which i and j functionality is for), then, the book come and said it proceed through each word, I don't know how to test this myself, like I can see how the inner process in the compiler itself..

vector<string> split(const string& s)
{
    vector<string> ret;
    typedef string::size_type string_size;
    string_size i = 0;

    // invariant: we have processed characters [original value of i, i) 
    while (i != s.size())
    {
        // ignore leading blanks
        // invariant: characters in range [original i, current i) are all spaces
     while (i != s.size() && isspace(s[i]))
         ++i;

     // find end of next word
     string_size j = i;
     // invariant: none of the characters in range [original j, current j)is a space
     while (j != s.size() && !isspace(s[j]))
         j++;
         // if we found some nonwhitespace characters 
         if (i != j) {
             // copy from s starting at i and taking j - i chars
             ret.push_back(s.substr(i, j - i));
             i = j;
         }
    }
    return ret;
}

int main() {
    string s;
    // read and split each line of input 
    while (getline(cin, s)) {
        vector<string> v = split(s);

        // write each word in v
        for (vector<string>::size_type i = 0; i != v.size(); ++i)
             cout << v[i] << endl;
    }
    return 0;
}

Upvotes: 1

Views: 16951

Answers (2)

Jagannath
Jagannath

Reputation: 4025

If you are just splitting based on space, then you don't need write a custom method. STL has options for you.

        std::string line;
        std::vector<std::string> strings;
        while ( std::getline(std::cin, line))
        {
             std::istringstream s ( line);
             strings.insert(strings.end(), 
                 std::istream_iterator<std::string>(s),  
                 std::istream_iterator<std::string>());
        }

     //  For simplicity sake using lambda.   
        std::for_each(strings.begin(), strings.end(), [](const std::string& str)
        {
            std::cout << str << "\n";
        });

Upvotes: 2

AusCBloke
AusCBloke

Reputation: 18492

The code you posted above does not split a line of text into words, based on whitespace, it instead splits a line into characters. However, that's if the code was actually compilable and not missing any necessary braces ({, }). EDIT: Actually whether it splits words or individual characters depends on where the braces go, bottom line is that the code doesn't compile.

Here is a fixed version of the code that splits each word, rather than each character, by simply moving the last if statement in split outside of it's immediate while block:

#include <iostream>
#include <vector>
using namespace std;

vector<string> split(const string& s)
{
   vector<string> ret;
   typedef string::size_type string_size;
   string_size i = 0;

   // invariant: we have processed characters [original value of i, i) 
   while (i != s.size()) {
      // ignore leading blanks
      // invariant: characters in range [original i, current i) are all spaces
      while (i != s.size() && isspace(s[i]))
         ++i;

      // find end of next word
      string_size j = i;
      // invariant: none of the characters in range [original j, current j)is a space
      while (j != s.size() && !isspace(s[j]))
         j++;

      // if we found some nonwhitespace characters 
      if (i != j) {
         // copy from s starting at i and taking j - i chars
         ret.push_back(s.substr(i, j - i));
         i = j;
      }
   }
   return ret;
}

int main() {
   string s;
   // read and split each line of input 
   while (getline(cin, s)) {
      vector<string> v = split(s);

      // write each word in v
      for (vector<string>::size_type i = 0; i != v.size(); ++i)
      cout << v[i] << endl;
   }
   return 0;
}

What happens to the string passed to split is:

  • While still characters in the string (while (i != s.size()))
    • While we're reading a space from the string while (i != s.size() && isspace(s[i]))
      • Increment the counter until we get to the start of a word (++i)
    • Set the end of the word as the start of the word (string_size j = i)
    • While we're still inside this word and not up to a space (while (j != s.size() && !isspace(s[j])))
      • Increment the counter indicating the end of the word (j++)
    • If there are some non-whitespace characters - end is greater than the start (if (i != j))
      • Create a sub-string from the start point to the end point of the word (s.substr(i, j - i)), and add that word to the vector (ret.push_back(..)).
    • Rinse and repeat.

Upvotes: 3

Related Questions