John
John

Reputation: 4324

Using Boost-Regex to parse string into characters and numerals

I'd like to use Boost's Regex library to separate a string containing labels and numbers into tokens. For example 'abc1def002g30' would be separated into {'abc','1','def','002','g','30'}. I modified the example given in Boost documentation to come up with this code:

#include <iostream>
#include <boost/regex.hpp>

using namespace std;

int main(int argc,char **argv){
    string s,str;
    int count;
    do{
        count=0;
        if(argc == 1)
        {
            cout << "Enter text to split (or \"quit\" to exit): ";
            getline(cin, s);
            if(s == "quit") break;
        }
        else
            s = "This is a string of tokens";

        boost::regex re("[0-9]+|[a-z]+");
        boost::sregex_token_iterator i(s.begin(), s.end(), re, 0);
        boost::sregex_token_iterator j;
        while(i != j)
        {
            str=*i;
            cout << str << endl;
            count++;
            i++;
        }
        cout << "There were " << count << " tokens found." << endl;

    }while(argc == 1);
    return 0;
}

The number of tokens stored in count is correct. However, *it contains only an empty string so nothing is printed. Any guesses as to what I am doing wrong?

EDIT: as per the fix suggested below, I modified the code and it now works correctly.

Upvotes: 2

Views: 1091

Answers (1)

holtavolt
holtavolt

Reputation: 4468

From the docs on the sregex_token_iterator:

Effects: constructs a regex_token_iterator that will enumerate one string for each regular expression match of the expression re found within the sequence [a,b), using match flags m (see match_flag_type). The string enumerated is the sub-expression submatch for each match found; if submatch is -1, then enumerates all the text sequences that did not match the expression re (that is to performs field splitting)

Since your regex matching all items (unlike the sample code, which only matched the strings), you get empty results.

Try replacing it with a 0.

Upvotes: 2

Related Questions