Nasseh
Nasseh

Reputation: 439

How to detect that a JavaScript file only contains JSON data, or INTENDED to contain JSON data?

Consider that we have a file called configuration.js, and when we look inside we see:

'use strict';
var profile = {
    "project": "%ProjectsRoot%\\SampleProject\\Site\\Site.csproj",
    "projectsRootKey": "%ProjectsRoot%",
    "ftp": {
        "address": "ftp://192.168.40.50/",
        "username": "",
        "password": ""
    },
    "delete": [
        "\\b(bin)\\b.*\\.config",
        "\\b(bin)\\b.*\\.js",
        "\\b(bin)\\b.*\\.css",
        "bin\\\\(?!ProjectName).*\\.(dll|pdb)"
    ],
    "replace": [
        {
            "file": "Web.config",
            "items": [
                {
                    "regex": "(<appSettings file=\")(bin\\\\)(Settings.config\">)",
                    "newValue": "$1$3"
                },
                {
                    "regex": "<remove\\s*segment=.bin.\\s/>",
                    "newValue": ""
                }
            ]
        }
    ]
};

In this case, the content of .js file is intended to be only JSON, yet for some IDE reasons it's stated as a JavaScript statement so that IDE recognizes the content and formats it correctly. This file might in another scenario contain:

{
  "project": "%ProjectsRoot%\\SampleProject\\Site\\Site.csproj",
  "projectsRootKey": "%ProjectsRoot%",
  "ftp": {
    "address": "ftp://192.168.40.50/",
    "username": "",
    "password": ""
  },
  "delete": [
    "\\b(bin)\\b.*\\.config",
    "\\b(bin)\\b.*\\.js",
    "\\b(bin)\\b.*\\.css",
    "bin\\\\(?!ProjectName).*\\.(dll|pdb)"
  ],
  "replace": [
    {
      "file": "Web.config",
      "items": [
        {
          "regex": "(<appSettings file=\")(bin\\\\)(Settings.config\">)",
          "newValue": "$1$3"
        },
        {
          "regex": "<remove\\s*segment=.bin.\\s/>",
          "newValue": ""
        }
      ]
    }
  ]
}

In both cases, the extension of files are better to be .json, rather than to be .js. We're creating a quality tool that has many features, one of which is to suggest to the developer to change file's extension based on content.

In both cases, how can we make sure that the file only contains JSON, or is INTENDED to only contain JSON?

Note: the reason for complex JSON here as example is to bring forward a real-word sample.

Upvotes: 2

Views: 194

Answers (2)

Haroldo_OK
Haroldo_OK

Reputation: 7230

To cover the second case, all you would need to do would be to feed the file to some JSON parser with very strict settings; if it rejects the file, then it won't be a JSON file.

To cover the first one, well, as long as you're only trying to validate that very specific case, one possibility would be to use some regex to remove both the semicolon at the end and the 'use strict'; var something = at the start, and then pass the resulting cleaned up text through a JSON parser to see if it is valid JSON.

If you need to handle more complex cases, you could use some JavaScript parser to generate an AST from the file, and then walk through the tree to validate it (say, if it contains a single variable, no functions, no statements, etc). Of course, that would be slightly more complex, though very powerful.

var STRICT_JSON_EXAMPLE = '{"value": "ok"}';
var JSON_LIKE_EXAMPLE = '\'use strict\';\nvar somevar = {"value": "ok"};';
var NON_JSON_EXAMPLE = 'alert("!!!");';

var EXAMPLES = [ STRICT_JSON_EXAMPLE, JSON_LIKE_EXAMPLE, NON_JSON_EXAMPLE ];

function isStrictJSON(text) {
  try {
JSON.parse(text);
return true;
  } catch (e) {
return false;
  }
}

function isJSONLike(text) {
  var regex = /^\s*(['"]use strict['"]\s*;?)?\s*var\s+\w+\s*=\s*(.*?);?$/;
  var cleanedText = text.replace(regex, '$2');
  return isStrictJSON(cleanedText);
}

alert('Strict JSON: ' + EXAMPLES.map(isStrictJSON).join(', ') +
 '\nJSON-like: ' + EXAMPLES.map(isJSONLike).join(', '));

Upvotes: 1

DIEGO CARRASCAL
DIEGO CARRASCAL

Reputation: 2129

You'll have to search for patterns in the content of the file, you have to define what makes a json file valid and search for it... try searching for: {"...":"..."} excluding spaces, end of lines... I use to do something like that in a Word + C# tool created to edit contracts, and after a while, the team noticed that the pattern recognition was the to go.

My suggestion is to create patterns for the different files and suggest the files that got the most coincidences or if you have to the one file type that got the most matches...

Upvotes: 0

Related Questions