Reputation: 1793
I need using regular expressions to match special key and values. There is a special condition that I do not know how to do.
The string likes abcd/abcd
. I need match all single words before /
.
So I write (.)*/
, and then I found it only match 1 token (d). What's more, even it matches all I need, I still do not know how many tokens matched.
So what should the correct regular expressions be? The real condition is much more complex than the example, so if it can be achieved by regular expressions, I do not want write a tokenizer.
Upvotes: 1
Views: 4440
Reputation: 627292
The Boost library that you are using provides a way to capture repeated groups into a stack provided you compiled the library with BOOST_REGEX_MATCH_EXTRA
flag set, otherwise what
won't have a member named captures
. When you use boost::regex_search
or boost::regex_match
, pass the boost::match_extra
flag, and you will capture all vlaues with your (.)*
(matching and capturing any character but a newline, zero or more occurrences) into a stack that is acessible via the captures
member of the sub_match
object.
Here is a demo method from the official Boost site:
#include <boost/regex.hpp>
#include <iostream>
void print_captures(const std::string& regx, const std::string& text)
{
boost::regex e(regx);
boost::smatch what;
std::cout << "Expression: \"" << regx << "\"\n";
std::cout << "Text: \"" << text << "\"\n";
if(boost::regex_match(text, what, e, boost::match_extra))
{
unsigned i, j;
std::cout << "** Match found **\n Sub-Expressions:\n";
for(i = 0; i < what.size(); ++i)
std::cout << " $" << i << " = \"" << what[i] << "\"\n";
std::cout << " Captures:\n";
for(i = 0; i < what.size(); ++i)
{
std::cout << " $" << i << " = {";
for(j = 0; j < what.captures(i).size(); ++j)
{
if(j)
std::cout << ", ";
else
std::cout << " ";
std::cout << "\"" << what.captures(i)[j] << "\"";
}
std::cout << " }\n";
}
}
else
{
std::cout << "** No Match found **\n";
}
}
int main(int , char* [])
{
print_captures("(.*)bar|(.*)bah", "abcbar");
return 0;
}
Upvotes: 2
Reputation: 95998
Why your regex doesn't work
The regex (.)*/
matches any character, zero or more times, followed by a /
.
The *
quantifier is greedy, it'll try to match whatever it can. Given the string "abcd/abcd", the regex engine matches "abcd/abcd", then it fails to match "/", so it backtracks until it reaches the "d", and finally make one step and matches the "/". The ()
is a group, and you're catching only the last character.
How to fix it
[^\/]*
This matches anything that's not a "/" (note that it's escaped), and is exactly what you want. If you want to catch the matched regex, you should change it to ([^\/]*)
, and now the first group contains your regex.
Note that there could be many solutions depending on the language you're using, not necessarily regex ones.
Upvotes: 2