Tiffany
Tiffany

Reputation: 25

Regex and no. of occurrences

I want it to match a 2-to-15-character string containing only capital/lowercase letters and numbers; hyphens only allowed in-between.

/^[a-z0-9][a-z0-9\-]{0,13}[a-z0-9]$/i

Such as: a-b-c ab a-b a-bcdef-g-h-i

How do I make sure that no two+ hyphens appear in a row?

Upvotes: 1

Views: 137

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336138

You could use a negative lookahead assertion:

/^(?!.*--)[a-z0-9][a-z0-9-]{0,13}[a-z0-9]$/i

(?!.*--) ensures that it's impossible to match -- anywhere in the string (without actually consuming any characters in the match).

Also, no need to escape the dash if it's the first or last character in a character class.

If you're not keen on lookaheads with indefinite quantifiers (like Donal Fellows), another way would be

/^[a-z0-9](?:[a-z0-9]|-(?!-)){0,13}[a-z0-9]$/i

(?:[a-z0-9]|-(?!-)){0,13} matches either an alphanumeric character or a dash if it's not followed by another dash, repeating up to 13 times.

As for performance (checked in Python 3.2.2):

>>> import timeit
>>> timeit.timeit(stmt='r.match("a--bcdefghijklmop-qrstuvwxyz")', 
... setup='import re; r=re.compile(r"^(?!.*--)[a-z0-9][a-z0-9-]{0,13}[a-z0-9]$")')
0.699529247317531
>>> timeit.timeit(stmt='r.match("a--bcdefghijklmop-qrstuvwxyz")', 
... setup='import re; r=re.compile(r"^[a-z0-9](?:[a-z0-9]|-(?!-)){0,13}[a-z0-9]$")')
0.6518945164968741
>>> timeit.timeit(stmt='r.match("a-bcdefghijklmop-qrstuvwxy--z")', 
... setup='import re; r=re.compile(r"^(?!.*--)[a-z0-9][a-z0-9-]{0,13}[a-z0-9]$")')
0.5857406334929749
>>> timeit.timeit(stmt='r.match("a-bcdefghijklmop-qrstuvwxy--z")', 
... setup='import re; r=re.compile(r"^[a-z0-9](?:[a-z0-9]|-(?!-)){0,13}[a-z0-9]$")')
2.2273210211646415

So the (?!.*--) is a tiny bit slower in its worst case scenario (-- early in the string, therefore lots of backtracking), but it's four times faster in its best case scenario (-- late in the string, so nearly no backtracking).

Upvotes: 5

Related Questions