Reputation: 456
I have a program that logs every GET
/POST
request made by a website during the page load process. I want to go through these requests one by one, execute them, and then determine if the file that was returned is a Javascript
. Given that it won't have a .js
ending (because of scripts like this, yanked from google.com a minute ago), how can I parse the file gotten from the request and identify if it is a Javascript file?
Thanks!
EDIT:
It is better to get a false positive than a false negative. That is, I would rather have some non-JS
included in the JS
-list than cut some real JS
from the list.
Upvotes: 1
Views: 163
Reputation: 1111
The javascript link that you referred does not have a content type, nor does it have the js extension. Any text file can be considered javascript if it can get executed which can make detection from scratch very difficult. There are two methods that come to mind.
Run a linter on the file contents. If the error is a syntax error or a Parsing error, it is not javascript. If there are no syntax error or parsing error, it should be considered javascript
Parse the AST (Abstract syntax tree) for the file contents. A javascript file would parse without errors. There should be a number of AST libraries available. I haven't worked with JS AST, so can't recommend any one of them but a quick search should give you some options.
I am not sure but probably a linter would also run AST before doing syntax checks. In this case, running AST seems like a lighter option.
Upvotes: 1
Reputation: 5514
The easiest way would be to check if there was anything identifying javascript files by their URI, because the alternatives are a lot heavier. But since you said this isn't an option, you can always check the syntax of the contents of each file using some heuristic tool. You can also check the response headers for its content-type.
Upvotes: 0