Michael Böckling
Michael Böckling

Reputation: 7952

Speed up regular expression

This is a regex to extract the table name from a SQL statement:

(?:\sFROM\s|\sINTO\s|\sNEXTVAL[\s\W]*|^UPDATE\s|\sJOIN\s)[\s`'"]*([\w\.-_]+)

It matches a token, optionally enclosed in [`'"], preceded by FROM etc. surrounded by whitespace, except for UPDATE which has no leading whitespace.

We execute many regexes, and this is the slowest one, and I'm not sure why. SQL strings can get up to 4k in size, and execution time is at worst 0,35ms on a 2.2GHz i7 MBP.

This is a slow input sample: https://pastebin.com/DnamKDPf

Can we do better? Splitting it up into multiple regexes would be an option, as well if alternation is an issues.

Upvotes: 3

Views: 747

Answers (3)

revo
revo

Reputation: 48761

There is a rule of thumb:

Do not let engine make an attempt on matching each single one character if there are some boundaries.

Try the following regex (~2500 steps on the given input string):

(?!FROM|INTO|NEXTVAL|UPDATE|JOIN)\S*\s*|\w+\W*(\w[\w\.-]*)

Live demo

Note: What you need is in the first capturing group.

The final regex according to comments (which is a little bit slower than the previous clean one):

(?!(?:FROM|INTO|NEXTVAL|UPDATE|JOIN)\b)\S*\s*|\b(?:NEXTVAL\W*|\w+\s[\s`'"]*)([\[\]\w\.-]+)

Upvotes: 1

CertainPerformance
CertainPerformance

Reputation: 371233

Because matches are often near the end, one possibility would be to essentially start at the end and backtrack, rather than start at the beginning and forward-track, something along the lines of

^(?:UPDATE\s|.*(?:\s(?:(?:FROM|INTO|JOIN)\s|NEXTVAL[\s\W]*)))[\s`'\"]*([\w\.-_]+)

https://regex101.com/r/SO7M87/1/ (154 steps)

While this may be much faster when a match exists, it's only a moderate improvement when there's no match, because the pattern must backtrack all the way to the beginning (~9000 steps from ~23k steps)

Upvotes: 1

Michał Ziober
Michał Ziober

Reputation: 38720

Regex optimisation is a very complex topic and should be done with help of some tools. For example, I like Regex101 which calculates for us number of steps Regex engine had to do to match pattern to payload. For your pattern and given example it prints:

1 match, 22976 steps (~19ms)

First thing which you can always do it is grouping similar parts to one group. For example, FROM, INTO and JOIN look similar, so we can write regex as below:

(?:\s(?:FROM|INTO|JOIN)\s|\sNEXTVAL[\s\W]*|^UPDATE\s)[\s`'"]*([\w\.-_]+)

For above example, Regex101, prints:

1 match, 15891 steps (~13ms)

Try to find some online tools which explain and optimise Regex such as myregextester and calculate how many steps engine needs to do.

Upvotes: 1

Related Questions