Kudu
Kudu

Reputation: 6888

Which regex is more efficient?

s/(?P<head>\[\[foo[^\[]*)abc/\g<head>def

s/(?=\[\[foo[^\[]*)abc/def

Which is more efficient? Are there any other ways to make it more efficient? Please note that although I used Perl-style syntax for illustration purposes, I'm actually using Python's re library, which doesn't allow the \K (keep) keyword.

Upvotes: 3

Views: 306

Answers (1)

chown
chown

Reputation: 52738

(?P<head>\[\[foo[^\[]*)abc in python using the re module is way faster:

import time
import re

rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')

total1, total2 = 0.0, 0.0

def timeRE(ver):
    x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" * 100)
    t1 = time.time()
    if ver is 1:
        rec1.sub("def", x)
    else:
        rec2.sub("def", x)
    return (time.time() - t1)

for x in xrange(50000):
    total1 += timeRE(1)

for x in xrange(50000):
    total2 += timeRE(2)

print total1
print total2

Outputs:

4.27380466461
16.9591507912

Edit (Run a few more times doing both calls in the same loop):

for x in xrange(50000):
    total1 += timeRE(1)
    total2 += timeRE(2)

Outputs:

4.26199269295
17.2384319305

Edit (Fixing sub to match question):

import time
import re
rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')
total1, total2 = 0.0, 0.0
def timeRE(ver):
    x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" * 100)
    t1 = time.time()
    if ver is 1:
        rec1.sub("\g<head>def", x)
    else:
        rec2.sub("def", x)
    return (time.time() - t1)

for x in xrange(50000):
    total1 += timeRE(1)
    total2 += timeRE(2)
print total1
print total2

Outputs:

Run 1:
4.62282061577
17.8212277889

Run 2:    
4.6660721302
17.1630160809

Run 3:
4.62124109268
17.21393013

Edit (with a string that will match the REGEX):

import time
import re

rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')
total1, total2 = 0.0, 0.0

def timeRE(ver):
    x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_<head>_<tail>_</head>_</tail>_abcdefghijklmnopqrstuvwxyz_<head>[[fooBAR_ABCDEFGHIJKLMNOPQRSTUVWXYZ_abc]]]]defghiojklmnopqrstuvwyz" * 100)
    t1 = time.time()
    if ver is 1:
        rec1.sub("\g<head>def", x)
    else:
        rec2.sub("def", x)
    return (time.time() - t1)

for x in xrange(50000):
    total1 += timeRE(1)
    total2 += timeRE(2)

print total1
print total2

Output:

23.4271130562
29.6934807301

And one final run:

import time
import re
rec1 = re.compile('(?P<head>\[\[foo[^\[]*)abc')
rec2 = re.compile('(?=\[\[foo[^\[]*)abc')
total1, total2 = 0.0, 0.0
def timeRE(ver):
    x = ("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_1234567890_<head>_<tail>_</head>_</tail>_abcdefghijklmnopqrstuvwxyz_<head>[[fooBAR_ABCDEFGHIJKLMNOPQRSTUVWXYZ_abc]]]]defghiojklmnopqrstuvwyz" * 100)
    t1 = time.time()
    if ver is 1:
        rec1.sub("\g<head>def", x)
    else:
        rec2.sub("def", x)
    return (time.time() - t1)
for x in xrange(50000):
    total1 += timeRE(1)
    total2 += timeRE(2)
print "Method 1: Avg run took: %+0.7f - With a total of: %+0.7f" % ((total1 / 50000.0), total1)
print "Method 2: Avg run took: %+0.7f - With a total of: %+0.7f" % ((total2 / 50000.0), total2)

Output:

Method 1: Avg run took: +0.0004924 - With a total of: +24.6196477
Method 2: Avg run took: +0.0005921 - With a total of: +29.6053855

Upvotes: 6

Related Questions