Reputation: 20225
I am trying to parse a simple sentence structure with Boost. This is my first time using Boost, so I could be doing this completely wrong. What I want to do is only accept strings in this format:
Since I don't know what characters are my delimiters (there could be tons), I have tried to make a regex that is sensitive to that. The only problem is, I am only getting the last letter of each word. This leads me to believe that my regex is correct, but my use of boost is not. Here's my code:
boost::regex regexp("[A-Za-z]([A-Za-z]|[0-9]|_|-)*", boost::regex::normal | boost::regbase::icase);
boost::sregex_token_iterator i(text.begin(), text.end(), regexp, 1);
boost::sregex_token_iterator j;
while(i != j){
cout << *i++ << std::endl;
}
I modeled this after what I found on the Boost website. I used the last example (at the bottom of the page) as a template to build mf code. In this instance, text is an object of type string.
Is my regex correct? Am I using boost correctly?
Upvotes: 0
Views: 368
Reputation: 19981
You're requesting the first submatch for each RE match. That refers to this subexpression: ([A-Za-z]|[0-9]|_|-)
and you're getting the last thing that matched (notice that it's qualified by a *
) for each match. Hence, the last character. I think you should pass 0 for the submatch number, or just omit that parameter. When I modify your code to do that, it does what I think you're wanting it to do.
Upvotes: 1
Reputation: 100668
Change your regex to: ([A-Za-z][-A-Za-z0-9_]*)
By putting the parentheses around the whole expression, the entire thing will be captured, not just the last character matched. Putting the - in front causes it to be a matched character and not a range specifier.
Upvotes: 2