Reputation: 514
The list of words is very long, I cannot paste the actual code that bugs out here. The regex whitelist has approx 4500 words in it seprated by a |
Both the regex, whitelist and whitelist2 includes the word hello but the test for each returns different results and I have no idea why after testing the same with javascript which gives correct results.
Here is the actionscript for testing. The line for whitelist might not be visible entirely, try copying pasting the code from the link below in your text/code editor. http://wonderfl.net/c/jTmb/
Edit1: problem I'm facing is that sometimes the words are not an exact match. Example saturdays need to match saturday. Its why I was using regex.
About the string length. I tried to check the length of the string and its being reported correctly. http://wonderfl.net/c/a9yp/
Edit2: Test showing it works in javascript http://tinyurl.com/m74hmdj
Upvotes: 2
Views: 195
Reputation: 7449
I'm thinking @tsiki is right about the max length of an AS3 regex.
This is really a comment, but since I'd like to include a bit of code, I'm putting it as an answer:
Since you're not using the regex for anything other than a list of words separated by |
, consider using an array instead. Another advantage of this approach is that it will be quite a bit faster.
// This is just a way of reusing your list,
// rather than manually transforming it to an array:
var whitelist:Array = "abasement|abastardize|abastardize|..."
.split("|");
// Simply use .toLowerCase() on the input string to make it case insensitive,
// assuming all your whitelist words are lower case.
trace(whitelist.indexOf("hello") >= 0);
ETA: Performance
Here are some performance comparisons.
_array
is pre-initialized to a lower case array of strings, split by |
. _regex
is pre-initialized to your regex. _search
is pre-initialized to a given word to search for.I'm using your words up to (and including) words starting with L
- to get around the max regex length limitation:
The code for each test:
regex.test:
_regex.test(_search);
array.indexOf:
_array.indexOf(_search.toLowerCase()) >= 0;
loop over array:
for (var j:int = 0; j < _array.length; j++)
{
if (_array[j] == _search)
{
break;
}
}
Update: loop, indexOf (check if search string is substring of item in whitelist):
for (var j:int = 0; j < _array.length; j++)
{
if (_search.indexOf(array[j]) !== -1)
{
break;
}
}
The AS3 compiler doesn't do any unfair optimization of this simple code (such as skipping executions due to not using the result - it's not all that clever).
10 runs, 1000 iterations each, FP 11.4.402.278 - release version:
Method Search for Avg. Min Max Iter.
---------------------------------------------------------------------------
array.indexOf "abasement" 0.0 ms 0 ms 0 ms 0 ms
regex.test "abasement" 18.4 ms 14 ms 22 ms 0.0184 ms
loop over array "abasement" 0.0 ms 0 ms 0 ms 0 ms
loop, indexOf "abasement" 0.0 ms 0 ms 0 ms 0 ms
array.indexOf "hello" 31.1 ms 25 ms 42 ms 0.0311 ms
regex.test "hello" 326.8 ms 309 ms 347 ms 0.3268 ms
loop over array "hello" 59.4 ms 50 ms 69 ms 0.0594 ms
loop, indexOf "hello" 97.4 ms 92 ms 105 ms 0.0974 ms
Avg. = average time for the 1000 iterations in each run
Min = Minimum time for the 1000 iterations in each run
Max = Maximum time for the 1000 iterations in each run
Iter. = Calculated time for a single iteration on average
It's quite clear that looping over the array and comparing each value is faster than using a regex. You could do a fair bit of comparison before it would catch up to the time the regex comparison spends. And in any event, we're dealing with fractions of milliseconds for a single lookup - it's really premature optimization, unless you're doing hundreds of lookups in a short period of time. If we were talking optimization, a Vector.<String>
might speed up things slightly more, compared to Array
.
The main point of this whole thing is that, except for relatively complex scenarios, a regex is unlikely to be more efficient than a tailored parser/comparer/lookup - that goes for all languages. It's designed to be a general purpose tool, not to do things the smartest way in every case (or pretty much any case for that matter).
Upvotes: 0
Reputation: 7449
Actual answer...
This question led me into finding some interesting AS3 limitations for the first time...
Your regex fails at the length it has by the word "metabrushite". As far as I can tell from various tests, this is where it hits the longest supported length of a regex in AS3: 31391 characters. Any regex longer than that seems to always return false
on a call to test()
. Note that "hello" appears in the list before "metabrushite", so it's not a matter of truncation - the regex simply silently fails to work at all - e.g. a regex that should always return true
for all words, still returns false
if it's that long.
The limit seems a rather arbitrary number, so it's hard to tell exactly what makes this limit.
Again, you should really not be using regex for a task like this, but if you feel you have to, you'll need to split it up into several regex'es, each of which don't exceed the maximum length.
Side note:
Another interesting thing, which I haven't examined more closely, is that creating the RegExp
from a single-statement concatenated string, i.e.:
trace("You'll never see this traced if too many words are added below.");
var s:String = "firstword|" +
"secondword|" +
... +
"lastword";
... will fail for even shorter resulting strings. This seems to be due to a max length imposed on the length of a single statement, and has nothing to do with regex. It doesn't freeze; it doesn't output an error or even the first trace
. The script is simply silently excluded from the swf and hence never executed.
Upvotes: 1