Reputation:

Parsing Template without regular expressions?

Well, here is my problem: I have an application that uses a custom Javascript implementation, but no support for Regular Expressions.

However, I'd like to be able to parse templates nevertheless; preferably using C++.

A template might look like this (ASP-style template):

<% var foo = someFunction("with a string");
   var bar =  anotherFunction(["with", "an", "array"]); %>

<b>This is html, and this is a variable: <%= bar %></b>

<% if(foo) { %>
    <b> foo is 'true'</b>
<% } else { %>
    <b> foo is 'false'. terrible. </b>
<% } %>

So the general structure is pretty simple (and I'd assume, relatively parseable).

My question is, Is it possible to parse such a template with a while() loop, going through each character, instead of using regular expressions?

And since my attempts to do that failed horribly, how could it be done?

Thank you!

Upvotes: 2

Answers (4)

Ira Baxter

Reputation: 95402

Such a template is quite easy to parse.

The key is recognizing that such templates basically consist of a sequence of just two kinds of strings: boilerplate (HTML) text, and script text.

The boiler plate text basically starts with "%>" and ends at "<%" (with special cases at begin-template and end-template). Script text is just everything else. Yes, you can pick off both with just a while loop for each one that watches for "<%", "%>" or "end-of-template". The sequence is implicit in alternating back and forth. That makes for pretty simple parser:

  while not eof
       boilerplate="";
       while next_characters~="<%" or eof
          boilerplate concat next_characters
       end
       scripttext="";
       while next_characters~="%>" or eof
          scripttext concat next_characters
       end
  end

(I leave the details of the individual character management for the coder).

What you didn't say is what you wanted to do with the parsed result. If the goal is to "generate output" from the parsed result, you'll have to turn it into a program. That's actually pretty easy.

Basically you write the result to a file and compile it. For each piece of collected boilerplate text, emit a print statement that prints the boilerplate text; you may have to escape the characters to make them legal in a string literal in your chosen target language, or break the boilerplate into multiple blocks to print it. For each block of script text, simply emit that unchanged. You'll likely have to emit a prolog text chunk to make a function header, and as postlog text chunk to make a function end.

That's it.

[Because of the trivial conversion between such "templates" and a simple program with print statements, I don't find such template programming to be very enticing. It saves me a few print keywords, and that's it.]

Upvotes: 3

Mikko Ohtamaa

Reputation: 83576

There already might be existing solution for you.

Logicless template libraries:

https://github.com/leonidas/transparency/wiki/Frequently-Asked-Questions

(See the last question).

Upvotes: 0

madfriend

Reputation: 2430

Have you thought about using a Finite State Machine? Here are some links for your to look at.

In short: FSM consists of finite number of states and transitions between these states. So the process of your parsing might be expressed as follows (pseudocode):

myFSM = new FSM( /* states, transitions */ );
// now your FSM is at initial state.

while not end of file {       
  switch (myFSM->currentState) {   
    case 'IF':
      // Does current line contain closing if? or else? If so, do a transition
      // to state that grabs everything in if construct 
      ...
    case 'TEXT': 
      // Lines do not have any lexical constructs, and we are outside any blocks
      ...
    ...

  }
}

Of course, this is hugely simplified. A real parser would look different. But I hope you got an idea.

Upvotes: 1

Attila

Reputation: 28772

This is what I would attempt in your case:

write a tokenizer that returns a set of known tokens (with the exact character string they represent attached, e.g.: ID("someFunction"))
write a formal grammar using the above tokens that describes the accepted template formats
write a parser that recognizes the grammar (e.g. a push-down automaton, LR parser or LALR parser))

Note: make sure the grammar conforms to the limitations of the parser you are implementing; if it does not, re-write the grammar or change parsers

Note: make sure you test your parser thorougly as errors in the implementation can be hard to debug

Note: during the parsing steps you will need to perform some additonal operations as well if you want to get the semantics (meaning) of the parsed template, not just whether it is a valid template. These extra steps involve storing IDs for variables/functions (remember the string attached to the tokens?), looking them up when referenced, checking function parameter numbers, etc.

Upvotes: 1

Parsing Template without regular expressions?

Answers (4)

Related Questions