mike3996
mike3996

Reputation: 17497

What happens when you compile regular expressions?

We all know that you can compile your much-used regular expressions into something that performs very good. But what is this witchcraft happening behind the curtains?

I guess that a finite state automaton gets built there, but you must know better than me.

Upvotes: 8

Views: 629

Answers (2)

Corbin March
Corbin March

Reputation: 25704

The details of regular expression compilation vary by implementation. For example, compilation in Python or re2 simply creates an instance of a regular expression object. This object's state machine may be modeled as a graph or virtual machine. Without compilation (example: RE.match(expression, input)), a new regular expression object is created behind the scenes each time match is called. This is needless work if you're going to use an expression more than once.

In C#, one of three things can happen when you compile:

  1. A regular expression object is created (implemented as a virtual machine) similar to Python and re2.
  2. A regular expression object is created and its virtual machine opcodes are compiled to in-memory IL instructions on-the-fly.
  3. A regular expression object is created and its virtual machine opcodes are compiled to disk as IL instructions.

You mention an interest in algorithms. Take a look at Russ Cox's excellent articles for two approaches:

Upvotes: 6

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798526

Compiling a regular expression is similar to compiling Java or Python code; the regular expression is converted to an intermediate representation that the RE engine then interprets in order to perform the appropriate operations upon a string.

Upvotes: 1

Related Questions