Reputation: 3984
I compared with Linux C regex library,
#include <iostream>
#include <chrono>
#include <regex.h>
int main()
{
const int count = 100000;
regex_t exp;
int rv = regcomp(&exp, R"_(([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?)_", REG_EXTENDED);
if (rv != 0) {
std::cout << "regcomp failed with " << rv << std::endl;
}
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < count; i++)
{
regmatch_t match;
const char *sz = "http://www.abc.com";
if (regexec(&exp, sz, 1, &match, 0) == 0) {
// std::cout << sz << " matches characters " << match.rm_so << " - " << match.rm_eo << std::endl;
} else {
// std::cout << sz << " does not match" << std::endl;
}
}
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << elapsed.count() << std::endl;
return 0;
}
The result is roughly 60-70 milliseconds on my testing machine.
Then I used libc++'s library,
#include <iostream>
#include <chrono>
#include <regex>
int main()
{
const int count = 100000;
std::regex rgx(R"_(([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?)_", std::regex_constants::extended);
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < count; i++)
{
std::cmatch match;
const char sz[] = "http://www.abc.com";
if (regex_search(sz, match, rgx)) {
} else {
}
}
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "regex_search: " << elapsed.count() << std::endl;
start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < count; i++)
{
const char sz[] = "http://www.abc.com";
if (regex_match(sz, rgx)) {
} else {
}
}
end = std::chrono::high_resolution_clock::now();
elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "regex_match: " << elapsed.count() << std::endl;
return 0;
}
The result is roughly 2 seconds for both regex_search & regex_match. This is about 30 times slower than C's regex.h library.
Is there anything wrong with my comparison? Is C++'s regex library not suitable for high performance case?
I can understand it's slow because there's no optimization in c++'s regex library yet, but 30 times slower is just too much.
Thanks.
Hi all,
Thanks for answering.
Sorry for my mistake I was using [] for C too but later I changed, and forgot to change C++ code.
I made two changes,
This is still a bit slow, but not as much as the original comparison.
Upvotes: 10
Views: 10344
Reputation: 106226
If you take a look at http://llvm.org/svn/llvm-project/libcxx/trunk/include/regex you'll see this implementation of regex_match
is layered atop regex_search
, and all overloads extract sub-expression match positions even if only into local temporaries that are thrown away. regex_search
uses a vector
of __state
objects that have .resize()
called on them so are presumably vectors too - all heap allocations and unnecessary when the subexpression matches aren't wanted, but would need to be tracked to support \1
etc in perl-style extensions to regular expressions: the old regcomp
/regexec
C functions didn't provide those extended features never have to do this extra work. Of course it would be nice if the clang implementation checked the regular expression's need for tracking matches during compilation and called leaner, faster functions to match when possible, but I guess they're just starting with support for the general case.
Upvotes: 14
Reputation: 93556
sz
and match
are loop invariant, you should move them to before (in both cases for sz
).
In the second case sz
is an initialised array instead of a pointer to a constant literal - that is an unfair and unnecessary difference. That said, if you move the declaration to before the loop as suggested, that should make little or no difference.
Although regex_search()
is overloaded for const const char*
that may internally cause construction of a std::string
, to avoid that possibility you should test it with:
const std::string sz( "http://www.abc.com" ) ;
(again before the loop).
So test:
std::cmatch match;
const char* = "http://www.abc.com";
for (int i = 0; i < count; i++)
{
if (regex_search(sz, match, rgx)) {
} else {
}
}
and
std::cmatch match;
const std::string sz( "http://www.abc.com" )
for (int i = 0; i < count; i++)
{
if (regex_search(sz, match, rgx)) {
} else {
}
}
Upvotes: 2
Reputation: 385295
The following two lines do not do the same thing!
const char sz1[] = "http://www.abc.com";
const char* sz2 = "http://www.abc.com";
That's already enough to make it an unfair test.
Upvotes: 11