Reputation:
I need figure out how to detect obfuscated JavaScript purely on static analysis.
Here is an example of a piece of obfuscated JavaScript that I need to detect just by static analysis.
eval(function(p,a,c,k,e,d){e=function(c){return c};if(!''.replace(/^/,String)){while(c--){d[c]=k[c]||c}k=[function(e){return d[e]}];e=function(){return'\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\b'+e(c)+'\b','g'),k[c])}}return p}('0.1("2");',3,3,'document|write|test'.split('|'),0,{}))
I was guessing that I can simply check the amount of key characters, such as (
, )
, and |
within a certain amount of characters. If this is possible, what characters are the most important?
Upvotes: 0
Views: 436
Reputation: 314
Simply look for eval() function. Without the eval() method, the obfuscated JavaScript cannot be deobfuscated. This is a common static analysis technique for detecting obfuscated code.
Upvotes: 0
Reputation: 121
You should rather count percentage of whitespace: spaces/tabs, newslines, comments etc.
Also you can analyze function and variable names to detect very strange names, eg. lrn2fl4ncew, g0034 etc. that are definitely non-dictionary based.
Third possible way is to detect absence of typical keywords, eg. eval, regexp etc. In rogue scripts such keywords are hidden in various ways to prevent easy detection.
Counting percentage of key characters or just short function/variable names is not enough, as this way you will get many false posivites from "compressed" scripts (without obfuscation).
Upvotes: 2