Mateo
Mateo

Reputation: 197

javascript parser for specific purpose

I am trying to create a tool that looks for missing translations in .html files. some of our translations are done at runtime in JS code. I would like to map these together. below is an example.

<select id="dropDown"></select>

// js
bindings: { 
 "dropDown": function() {
              translate(someValue);
              // translate then set option
           }
 }

above you can see that I have a drop down where values are created & translated at runtime. I was thinking that an AST would be the right way to accomplish this. basically I need to go through the .html file looking for tags that are missing in-line translations (done using {{t value}}) and search the corresponding .js file for run time translations. Is there a better way to accomplish this? any advice on a tool for creating the AST?

Upvotes: 0

Views: 556

Answers (1)

Ira Baxter
Ira Baxter

Reputation: 95354

I think you want to hunt for patterns in the code. In particular, I think you want to determine, for each HTML select construct, if there is a corresponding, properly shaped JavaScript fragment with the right embedded ID name.

You can do that with an AST, right. In your case, you need an AST for the HTML file (nodes are essentially HTML tags) with sub-ASTs for script chunks () tags containing the parsed JavaScript.

For this you need two parsers: one to parse the HTML (a nasty just because HTML is a mess), that produces a tree containing script nodes having just text. Then you need a JavaScript parser that you can apply to the text blobs under the Script tags, to produce JavaScript ASTs; ideally, you splice these into the HTML tree to replace the text blob nodes. Now you have a mixed tree, with some nodes being html, with some subtrees that are JavaScript. Ideally the HMTL nodes are marked as being HTML, and the JavaScript nodes are marked as Javascript.

Now you can search the tree for select nodes, pick up the id, and the search all the javascript subtrees for expected structure.

You can code the matching procedurally, but it will be messy:

  for all node
     if node is HTML and nodetype is Select
        then
            functionname=node.getchild("ID").text
            for all node
               if node is JavaScript and node.parent is HTML and nodetype is pair
                  then if node.getchild(left).ext is "bindings"
                       then if node.getchild(right)=structure
                          then...   (lots more....)

There's a lot of hair in this. Technically its just sweat. You have to know (and encode) the precise detail of the tree, in climbing up and downs its links correctly and checking the node types one by one. If the grammar changes a tiny bit, this code breaks too; it knows too much about the grammar.

You can finish this by coding your own parsers from scratch. Lots more sweat.

There are tools that can make this a lot easier; see Program Transformation Systems. Such tools let you define language grammars, and they generate parsers and AST builders for such grammars. (As a general rule, they are pretty good at defining working grammars, because they are designed to be applied to many languages). That at least puts lots of structure into the process, and they provide a lot of machinery to make this work. But the good part is that you can express patterns, usually in source language surface syntax, that can make this a lot easier to express.

One of these tools is our DMS Software Reengineering Toolkit (I'm the architect).

DMS has dirty HTML and full JavaScript parsers already, so those don't need to be built. You would have to write a bit of code for DMS to invoke the HTML parser, find the subtree for script nodes, and apply the JavaScript parser. DMS makes this practical by allowing you parse a blob of text as an arbitrary nonterminal in a grammar; in this case, you want to parse those blobs as an expression nonterminal.

With all that in place, you can now write patterns that will support the check:

 pattern select_node(property: string): HTML~dirty.HTMLform =
       " <select ID=\property></select> ";

 pattern script(code: string): HTML~dirty.HTMLform =
       " <script>\code</script> ";

 pattern js_bindings(s: string, e:expression):JavaScript.expression =
       " bindings : { \s : function () 
                            { translate(\e);
                            }
                    } ";

While these patterns look like text, they are parsed by DMS into ASTs with placeholder nodes for the parameter list elements, denoted by "\nnnn" inside the (meta)quotes "..." that surround the program text of interest. Such ASTs patterns can be pattern matched against ASTs; they match if the pattern tree matches, and the pattern variable leaves are then captured as bindings. (See Registry:PatternMatch below, and resulting match argument with slots matched (a boolean) and bindings (an array of bound subtrees resulting from the match). A big win for the tool builder: he doesn't have to know much about the fine detail of the grammar, because he writes the pattern, and the tool produces all the tree nodes for him, implicitly.

With these patterns, you can write procedural PARLANSE (DMS's Lisp-style programming language) code to implement the check (liberties taken to shorten the presentation):

(;; `Parse HTML file':
      (= HTML_tree (HMTL:ParseFile .... ))
    `Find script nodes and replace by ASTs for same':
       (AST:FindAllMatchingSubtrees HTML_tree
         (lambda (function boolean [html_node AST:Node])
           (let (= [match Registry:Match]
                   (Registry:PatternMatch html_node "script"))
              (ifthenelse match:boolean
                (value (;; (AST:ReplaceNode node
                               (JavaScript:ParseStream
                                  "expression" ; desired nonterminal
                                  (make Streams:Stream
                                       (AST:GetString match:bindings:1))))
                       );;
                  ~f ; false: don't visit subtree
                )value
                ~t ; true: continue scanning into subtree
              )ifthenelse
           )let
         )lambda )
     `Now find select nodes, and check sanity':
       (AST:FindAllMatchingSubtrees HTML_tree
         (lambda (function boolean [html_node AST:node])
           (let (;; (= [select_match Registry:Match] ; capture match data
                       (Registry:PatternMatch "select" html_node)) ; hunt for this pattern
                    [select_function_name string]
                );;
              (ifthenelse select_match:boolean
                (value (;; `Found <select> node.
                            Get name of function...':
                           (= select_function_name
                             (AST:GetString select_match:bindings:1))

                           `... and search for matching script fragment':
                            (ifthen
                              (~ (AST:FindFirstMatchingSubtree HTML_tree
                                     (lambda (function boolean [js_node AST:Node])
                                       (let (;; (= [match Registry:Match] ; capture match data 
                                                (Registry:PatternMatch js_node "js_bindings")) ; hunt for this pattern
                                          (&& match:boolean
                                             (== select_match:bindings:1
                                                 select_function_name)
                                         )&& ; is true if we found matching function
                                     )let
                                   )lambda ) )~
                              (;; `Complain if we cant find matching script fragment'
                                  (Format:SNN `Select #S with missing translation at line #D column #D'
                                     select_function_name
                                     (AST:GetLineNumber select_match:source_position)
                                     (AST:GetColumnNumber select_match:source_position)
                                  )
                              );;
                           )ifthen

                     );;
                  ~f ; don't visit subtree
                )value
                ~t ; continue scanning into subtree
              )ifthenelse
           )let
         )lambda )
);;

This procedural code first parses an HTML source file producing an HTML tree. All these nodes are stamped as being from the "HTML~dirty" langauge. It then scans that tree to find SCRIPT nodes, and replaces them with an AST obtained from a JavaScript-expression-parse of the text content of the script nodes encountered. Finally, it finds all SELECT nodes, picks out the name of the function mentioned in the ID clause, and checks all JavaScript ASTs for a matching "bindings" expression as specified by OP. All of this leans on the pattern matching machinery, which in turn leans on top of the low-level AST library that provides a variety of means to inspect/navigate/change tree nodes.

I've left out some detail (esp. error handling code) and obviously haven't tested this. But this gives the flavor of how you might do it with DMS.

Similar patterns and matching processes are available in other program transformation systems.

Upvotes: 4

Related Questions