Ivanhoe Cheung
Ivanhoe Cheung

Reputation: 80

RegExp doesn't work fine

I'm working on a template engine, I try to catch all strings inside <% %>, but when I work it on the <%object.property%> pattern, everything fails. My code:

var render = function(input, data){
    var re = /<%([^%>]+)?%>/g;
    var templateVarArray;
    // var step = "";
    while((templateVarArray = re.exec(input))!=null){
        var strArray = templateVarArray[1].split(".");
        // step+= templateVarArray[1]+" ";
        if(strArray.length==1)
            input = input.replace(templateVarArray[0], data[templateVarArray[1]]);
        if(strArray.length==2){
            input = input.replace(templateVarArray[0], data[strArray[0]][strArray[1]]);
        }
    }
    // return step;
    return input;
}
var input = "<%test.child%><%more%><%name%><%age%>";

document.write(render(input,{
    test: { child: "abc"},
    more: "MORE",
    name:"ivan",
    age: 22


}));

My result:

abc<%more%><%name%>22

what I want is: abc MORE ivan 22

Also, the RegExp /<%([^%>]+)?%>/g is referenced online, I did search its meaning, but still quite not sure the meaning. Especially why does it need "+" and "?", thanks a lot!

Upvotes: 0

Views: 83

Answers (2)

rasmeister
rasmeister

Reputation: 1996

If you add a console.log() statement it will show where the next search is going to take place:

while((templateVarArray = re.exec(input))!=null){
    console.log(re.lastIndex);    // <-- insert this
    var strArray = templateVarArray[1].split(".");
    // step+= templateVarArray[1]+" ";
    if(strArray.length==1)
        input = input.replace(templateVarArray[0], data[templateVarArray[1]]);
    if(strArray.length==2){
        input = input.replace(templateVarArray[0], data[strArray[0]][strArray[1]]);
    }
}

You will see something like:

14
26

This means that the next time you run re.exec(...) it will start at index 14 and 26 respectively. Consequently, you miss some of the matches after you substitute data in.

As @Alexander points out take the 'g' off the end of the regex. Now you will see something like this:

0
0

This means the search will start each time from the beginning of the string, and you should now get what you were looking for:

abcMOREivan22

Regarding your questions on the RegEx and what it is doing, let's break the pieces apart:

<% - this matches the literal '<' followed immediately by '%'

([^%>]+) - the brackets (...) indicate we want to capture the portion of the string that matches the expression within the brackets
  [^...] - indicates to match anything except what follows the '^'; without the '^' would match whatever pattern is within the []
  [^%>] - indicates to match and exclude a single character - either a '%' or '>'
  [^%>]+ - '+' indicates to match one or more; in other words match one or more series of characters that is not a '%' and not a '>'

? - this indicates we want to do reluctant matching (without it we do what is called 'greedy' matching)

%> - this matches the literal '%' followed immediately by '>'

The trickiest part to understand is the '?'. Used in this context it means that we stop matching with the shortest pattern that will still match the overall regex. In this case, it doesn't make any difference whether you include it though there are times where it will matter depending on the matching patterns.

Suggested Improvement

The current logic is limited to data that nests two levels deep. To make it so it can handle an arbitrary nesting you could do this:

First, add a small function to do the substitution:

var substitute = function (str, data) {
  return str.split('.').reduce(function (res, item) {
    return res[item];
  }, data);
};

Then, change your while loop to look like this:

  while ((templateVarArray = re.exec(input)) != null) {
    input = input.replace(templateVarArray[0], substitute(templateVarArray[1], data));
  }

Not only does it handle any number of levels, you might find other uses for the 'substitute()' function.

Upvotes: 1

Alexander Art
Alexander Art

Reputation: 1589

The RegExp.prototype.exec() documentation says:

If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property).

But you are replacing each match in the original string so next re.exec with a lastIndex already set not to zero will continue to search not from beginning and will omit something.

So if you want to search and substitute found results in original string - just omit \g global key:

var render = function(input, data) {
  var re = /<%([^%>]+)?%>/;
  var templateVarArray;
  // var step = "";
  while (!!(templateVarArray = re.exec(input))) {
    var strArray = templateVarArray[1].split(".");
    if (strArray.length == 1)
      input = input.replace(templateVarArray[0], data[templateVarArray[1]]);
    if (strArray.length == 2) {
      input = input.replace(templateVarArray[0], data[strArray[0]][strArray[1]]);
    }
  }
  // return step;
  return input;
}
var input = "<%test.child%><%more%><%name%><%age%>";

document.write(render(input, {
  test: {
    child: "abc"
  },
  more: "MORE",
  name: "ivan",
  age: 22
}));

Upvotes: 0

Related Questions