Regular expression group matching using Boost::regex

Question

I have strings of format:

7XXXX 8YYYY 9ZZZZ 0LLLL 7XXXX 8YYYY 9ZZZZ 0LLLL,

where 7XXXX 8YYYY 9ZZZZ 0LLLL groups can repeat any number of times;
X, Y, Z, L are digits;
Groups starting 7,8,9,0 all go in sequence
there can be missing groups like 7XXXX 0LLLL 8YYYY 0LLLL 7XXXX 8YYYY 9ZZZZ 0LLLL

I am trying to accomplish my goal using Boost::regex library.

I want to split these groups and get them into an array or vector. For now I am trying to cout them.

I am trying to do it this way, but I only can get full string match or last match in every of 7,8,9,0 groups, but not strings like these 7XXXX 8YYYY 9ZZZZ 0LLLL

 const char* pat = "(([[:space:]]+7[0-9]{4}){0,1}([[:space:]]+8[0-9]{4}){0,1}([[:space:]]+9[0-9]{4}){0,1}([[:space:]]+0[0-9]{4}){0,1})+";;
 boost::regex reg(pat);
 boost::smatch match;
 string example= "71122 85451 75415 01102 75555 82133 91341 02134";

 const int subgroups[] = {0,1,2,3,4,5,6};
 boost::sregex_token_iterator i(example.begin(), example.end(), reg, subgroups);
 boost::sregex_token_iterator j;

 while (i != j)
 {
   cout << "Match: " << *i++ << endl;
 }

Sample output:

Match: 71122 85451 75415 01102 75555 82133 91341 02134

Match: 75555
Match: 82133
Match: 91341
Match: 02134

But I want to get it like this:

71122 85451 
75415 01102 
75555 82133 91341 02134

I know I am doing it wrong, can't come up with something good using regex to do what I want :( Why can't I get all the recursive matches using parentheses?

Wintermute · Accepted Answer

EDIT: Since I completely misunderstood the first time around, I'll just replace the whole answer. I'm thinking along these lines:

const char* pat = "[[:space:]]+((7[0-9]{4})?([[:space:]]+8[0-9]{4})?([[:space:]]+9[0-9]{4})?([[:space:]]+0[0-9]{4})?)";
boost::regex reg(pat);
boost::smatch match;

//                    v-- extra space here to make the match easier.
std::string example= " 71122 85451 75415 01102 75555 82133 91341 02134";

boost::sregex_token_iterator i(example.begin(), example.end(), reg, 1);
boost::sregex_token_iterator j;

while (i != j)
{
  std::cout << "Match: " << *i++ << std::endl;
}

If the string cannot be modified, a workaround around the problem of empty matches is

const char* pat = "((7[0-9]{4})?([[:space:]]+8[0-9]{4})?([[:space:]]+9[0-9]{4})?([[:space:]]+0[0-9]{4})?)";
boost::regex reg(pat);
boost::smatch match;
std::string example= "71122 85451 75415 01102 75555 82133 91341 02134";

boost::sregex_token_iterator i(example.begin(), example.end(), reg, 1);
boost::sregex_token_iterator j;

while (i != j)
{
  if(i->length() != 0) {
    std::cout << "Match: " << *i << std::endl;
  }

  ++i;
}

Although in that case it'd arguably be nicer to use regex_iterator instead of regex_token_iterator:

// No need for outer spaces anymore
const char* pat = "(7[0-9]{4})?([[:space:]]+8[0-9]{4})?([[:space:]]+9[0-9]{4})?([[:space:]]+0[0-9]{4})?";

boost::sregex_iterator i(example.begin(), example.end(), reg);
boost::sregex_iterator j;

// Rest the same.

Regular expression group matching using Boost::regex

Answers (2)

Related Questions