Earlz
Earlz

Reputation: 63815

Regex vs. Manual comparison. Which is faster?

in writing a scripting engine, I have functions like (psuedo-code)

function is_whitespace?(char c){
  return c==' ' || c=='\t' || c=='\r' || c=='\n';
}

Well, my question is which is faster in most langugaes? That or using regex like

function is_whitespace?(char c){
  return regex_match('\s',c);
}

The chief languages I'm concerned with are C#, C, and Ruby also in case it is completely platform-dependent.

Upvotes: 11

Views: 3777

Answers (5)

Andrew Grimm
Andrew Grimm

Reputation: 81450

I can't speak about C# or C, but I wouldn't assume the non-regex form is faster in Ruby.

Upvotes: 0

user508546
user508546

Reputation: 441

after disk usage, regexes are almost always my performance bottleneck when i profile my code. even for simple things like .split(" ").

Upvotes: 1

Jan Goyvaerts
Jan Goyvaerts

Reputation: 21999

The manual comparison is faster to execute, the regex comparison is faster to type.

Note that your two implementations are not equivalent if your system uses Unicode. The regex \s matches all Unicode whitespace while your manual comparison only handles basic ASCII and does not even include the vertical tab and form feed characters which are usually also considered whitespace.

If you're writing this in a high-level language I'd suggest using the is_whitespace() function already provided by your programming language's libraries. A basic function like that is almost always included.

So in the end the answer is "it depends". In some situations the extra programming effort of using procedural code is warranted. In many cases the regex is fast enough and easier to maintain.

Upvotes: 4

dawg
dawg

Reputation: 103744

In most cases, the regex to find a something like a whitespace character is very fast. You have many eyeballs looking at performance in the leading regex implementations and there are probably other areas of 'low hanging fruit' for optimization in other areas of your code.

The areas of bad performance of a regex is a poorly written regex. Tips are to avoid as much unnecessary backtracking, grouping and alteration as possible. Use something like "Regex Buddy" or Perl with "use re debug" to see how many branches your regex takes.

The links are to some regex performance issues.

If in doubt, do comparative timings...

Coding Horor- Regex

Java Performance - Regex

Upvotes: 1

wRAR
wRAR

Reputation: 25693

Of course four comparisons of small chunks of memory are greatly faster (and using almost no memory) than building, running and destroying a state machine.

Upvotes: 17

Related Questions