Reputation: 1287
I wonder how does regex work, my particular regex has an element that looks like this:
(word1|word2|wordn......)
The numbers of words is big several hundreds.
I wonder if the regex engine is just testing the words one by one or if it optimizes the search and it what way.
Any pointer to good documentation will be good.
Upvotes: 5
Views: 318
Reputation: 421340
If it seems to you that the RE engine is the bottleneck in such search, you could easily construct a trie and check for containment.
Upvotes: 1
Reputation: 10321
If you have several hundred words, you need to beware of the ordering of the words in the regex. The regex engine looks for the words from left to right.
If you test the word setValue
against the alternation set|setValue
, it will match only the 3 letters comprising "set", and not the whole string.
See this link (from www.regular-expressions.info) for the full explanation.
I don't think that the regex engine truly optimizes alternations (i.e., analyzing common prefixes and building nfa accordingly). Therefore, with so many words, I don't think it will be an optimization.
Aside from re-ordering the words, you can also try adding word or line boundary after the alternation, e.g. (set|setValue)$
, but I suspect that the regex engine will do a lot of backtracking so it may not be worth the effort.
Upvotes: 1