Randomblue
Randomblue

Reputation: 116343

Understanding "global" RegExp

The output I expect for "test".match(/[a-z]{0,}/g); should contain '', 't', 'e', 's', 't', 'te', 'es', 'st', etc.

However I only get '' and 'test' from the console. What's happening here?

Upvotes: 2

Views: 148

Answers (6)

falsarella
falsarella

Reputation: 12447

If what you are looking for is all word combinations, here is a code I've worked on:

<html>
<body>

<script type="text/javascript">
function removeDuplicateElement(arrayName) {
    var newArray=new Array();
    label: for(var i=0; i<arrayName.length;i++){
        for(var j=0; j<newArray.length;j++) {
            if(newArray[j]==arrayName[i]) 
                continue label;
        }
        newArray[newArray.length] = arrayName[i];
    }
    return newArray;
}

var all=new Array();
var str="test";
for (;str.length>0;str=str.substring(1,str.length)) {
    for (var i = 0; i<=str.length;i++){
        var patt1=new RegExp("([a-z]{"+i+","+i+"})", "g");
        all=all.concat(str.match(patt1));
    }
}

document.write(removeDuplicateElement(all));
</script>

</body>
</html>

For 'test', it returns ',t,e,s,te,st,tes,test,es,est'.

Upvotes: 2

jabclab
jabclab

Reputation: 15042

If you want to create this functionality yourself you could try something like:

String.prototype.fullMatch = function () {
    var matches = [""]; // "" is always a match

    function do_regex(str, startAt) {
        var len = str.length,
            i,
            j,
            regex,
            all_matches = [];

        for (i = startAt; i < len; i++) {
            regex = new RegExp("[a-z]{" + (i + 1) + "}", "g");
            all_matches = str.match(regex);
            for (j = 0; j < all_matches.length; j++) {
                matches.push(all_matches[j]);
            }
        }
    }

    for (var k = 0; k < this.length; k ++) {
        do_regex(this.substring(k), k);
    }

    return matches;  
};

console.log("test".fullMatch()); // ["", "t", "e", "s", "t", "te", "st", "tes", "test", "es", "est"]

Upvotes: 1

stew
stew

Reputation: 11366

match returns a array of matches, with each starting after the end of the previous match. If you instead use "test".match(/[a-z]/g); you would get ["t", "e", "s", "t"] as a result. "t" is matched, the next thing that matches after "t" is "e"...

In your query, the regex matches the entire string, so it emits "test", after that the empty string that follows "test" matches (since you used {0,} instead of {1,}.

Upvotes: 1

Jan P&#246;schko
Jan P&#246;schko

Reputation: 5580

Matching a string against a global regular expression will always give you the longest possible non-overlapping substrings. The first longest match is the entire string "test", and then the empty string "" remains, which is a match, too. You could achieve what you want by matching against several regular expressions with different length specifiers, like so:

"test".match(/[a-z]{0}/g);
"test".match(/[a-z]{1}/g);
"test".match(/[a-z]{2}/g);
"test".match(/[a-z]{3}/g);
"test".match(/[a-z]{4}/g);

Of course, this should be done more elegantly; you could construct those regular expressions dynamically using new RegExp("[string]"), for instance. Still, this would not yield "es", for instance (although you could work around that again), but according to your question you don't want to get this match anyway.

Upvotes: 1

maerics
maerics

Reputation: 156552

You only get ["", "test"] because the quantifier {0,} will match either zero or any number of letters (just like *) and is greedy (just like *) so it matches the maximum number it can find (and zero).

Upvotes: 1

fge
fge

Reputation: 121790

Your regex matches the full text in the first pass, and an empty string in the second pass, this is why.

Unfortunately, what you want requires that the regex engine support the \G modifier (IIRC), which ECMA 262 regexes don't have.

Upvotes: 4

Related Questions