Perl Regular Expression running faster than C++ Boost Implementation

Question

I am kind of confused as to what is happening here. Most benchmarks I have seen have Boost being close to Perl or even beating it in terms of performance. In my scripts however, my Perl implementation is faster in order of 5-6 times.

I open files in both test_script.cpp & test_script.pl and read in line by line, populating an array. Then, I run these strings against a list of regex definitions in a linear definition until they match, in which case nothing happens (I/O was removed for testing purposes) and then the next string is compared, etc until we have compared all strings.

Test_script.pl:

#make incomingList, which contains all incoming strings
my $start = Time::HiRes::gettimeofday();

foreach (@incomingList) {
  my $inString = $_;
  &find_pattern($inString);
}

my $end = Time::HiRes::gettimeofday();
printf("%.6f
", $end - $start);

Find_pattern method:

sub find_pattern {
  my $URLString = $_[0];

  #1 rewrite
  if($URLString =~ m/^/stuff/brands-([^/]*)/(.*)?$/) {

  }
  #2 rewrite
  elsif($URLString =~ m/^/coupons(/.*)?$/){

  }
  #3 rewrite
  elsif($URLString =~ m/^/han/(.+)$/){

  }
  # ...continues on, there are 100 patterns. 
}

Test_script.cpp: Main method:

populateArray();
//make stringArr, which contains all incoming strings
struct timeval time;
gettimeofday(&time, NULL);
double t1=time.tv_sec+(time.tv_usec/1000000.0);   

for(int j =0; j < 10000; j++){
  getRule(stringArr[j]);
 }

gettimeofday(&time, NULL);
double t2=time.tv_sec+(time.tv_usec/1000000.0);
printf("%.6lf seconds elapsed
", t2-t1);

populate array method:

static void populateArray(){
regexArray[1] =  boost::regex ("\/stuff\/brands-([^\/]*)\/(.*)?");
regexArray[2] =  boost::regex ("\/coupons(\/.*)?");
regexArray[3] =  boost::regex ("\/han\/(.+)"); 
//continues on, 100 definitions. 
}

getRule method:

static void getRule(string inQuery){
  for(int i =1; i < 100; i++){
    if(boost::regex_match(inQuery, regexArray[i])){
      break; 
     }
  }

I understand that it might seem a little odd that I'm doing a linear list of if else checks in perl, but that's because I have to reformat each rule independently later. Regardless, unless I'm misunderstanding something, these two scripts are pretty similar- they look down this list of regex definitions until they find a match, and then they continue with other incoming strings.

So then why are these results so different? For 100 rules (same used for both scripts) & 10,000 inputs, The .cpp averages to around 0.155 seconds, and the .pl averages to around 0.028 seconds. Edit: With compiler optimization in place, the C++ script is operating at roughly 0.091 seconds, still slower.

Any insight is appreciated.

Perl Regular Expression running faster than C++ Boost Implementation

Answers (1)

Related Questions