Reputation: 181
I have a log file, which consists of 400k log lines. I found out, that my c++ code is very slow in comparison to perl code. So I made a simple iteration over my log file and used regex of c++ and of perl. Perl scripts executes very fast while on the other hand c++ is taking time.
In c++ i have in use #include<regex>
library. Whereas in perl, regex can be used directly.
How can I make c++ code as efficient as perl? Since perl's implementation is by C only.
regex log_line("(\\d{1,2}\\/[A-Za-z]{3}\\/\\d{1,4}):(\\d{1,2}:\\d{1,2}:\\d{1,2}).*?\".*?[\\s]+(.*?)[\\s\?].*?\"[\\s]+([\\d]{3})[\\s]+(\\d+)[\\s]+\"(.*?)\"[\\s]+\"(.*?)\"[\\s]+(\\d+)");
string line;
int count =0;
smatch match;
while(getline(logFileHandle, line){
if(regex_search(line , match , log_line)==true){
count++
}
open(N==LOG_FILE,"<$log_file_location");
my $count=0;
while($thisLine = <=LOG_FILE>){
if((($datePart, $time, $requestUrl, $status, $bytesDelivered, $httpReferer, $httpUserAgent, $requestReceived) = $thisLine =~ /(\d{1,2}\/[A-Za-z]{3}\/\d{1,4}):(\d{1,2}:\d{1,2}:\d{1,2}).*?\".*?[\s]+(.*?)[\s\?].*?\"[\s]+([\d]{3})[\s]+(\d+)[\s]+\"(.*?)\"[\s]+\"(.*?)\"[\s]+(\d+)/o) == 8){
$count++;
}
}
I'm afraid, if my question is not in the right format or something is missing let me know. Thanks.
EDIT 1 So I used chrono library in c++ to find out the time taken. Below is the output result. I took a sample of log file to make things easy. Simply reading the log file and counting no. of lines takes 57 ms. When regex_search is used it takes a whopping 2462 ms for the same sample log file.
No of Lines27399
With regex + logfileRead
Time taken by function: 2462 milliseconds
No of Lines27399
With just simple logfileRead
Time taken by function: 57 milliseconds
Upvotes: 1
Views: 1461
Reputation: 853
Use a code generator tool like re2c or ragel to compile your regular expression into C/C++ code (which can be optimized by the compiler).
Alternatively, Boost.Regex -- which was the basis for std::regex -- may be faster than your std::regex implementation.
Also, the bottleneck might be I/O rather than regular expressions. Why is reading lines from stdin much slower in C++ than Python?
Upvotes: 2