sometimesiwritecode
sometimesiwritecode

Reputation: 3223

How can I handle reading a .json file in it that has comments with python?

Firstly, I understand that comments aren't valid json. That said, for some reason this .json file I have to process has comments at the start of lines and at the end of lines.

How can i handle this in python and basically load the .json file but ignore the comments so that I can process it? I am currently doing the following:

with open('/home/sam/Lean/Launcher/bin/Debug/config.json', 'r') as f:
        config_data=json.load(f)

But this crashes at the json.load(f) command because the file f has comments in it.

I thought this would be a common problem but I can't find much online RE how to handle it in python. Someone suggested commentjson but that makes my script crash saying

ImportError: cannot import name 'dump'

When I import commentjson

Thoughts?

Edit: Here is a snippet of the json file i must process.

{
  // this configuration file works by first loading all top-level
  // configuration items and then will load the specified environment
  // on top, this provides a layering affect. environment names can be
  // anything, and just require definition in this file. There's
  // two predefined environments, 'backtesting' and 'live', feel free
  // to add more!

  "environment": "backtesting",// "live-paper", "backtesting", "live-interactive", "live-interactive-iqfeed"

  // algorithm class selector
  "algorithm-type-name": "BasicTemplateAlgorithm",

  // Algorithm language selector - options CSharp, FSharp, VisualBasic, Python, Java
  "algorithm-language": "CSharp"
}

Upvotes: 13

Views: 10191

Answers (5)

We use a powerful json preprocessor to solve this problem. Next to comments it supports also

  • import (nested) JSON files
  • use ${variable} syntax to reference already before defined variables
  • use python syntax (True, False, None, …)
  • . (dot) syntax for dictionary objects

Download: JsonPreprocessor (PyPI)

This allows common definitions and hierarchical structures for huge projects.

We use also a VSCode Plugin for JSONP syntax: test-fullautomation/vscode-jsonp (github.com)

Upvotes: 0

Audrius Meškauskas
Audrius Meškauskas

Reputation: 21778

Switch into json5. The JSON 5 is a very small superset of JSON that supports comments and few other features you could just ignore.

import json5 as json
# and the rest is the same

It is beta, and it is slower, but if you just need to read some short configuration once when starting the program, this probably can be considered as an option. It is better to switch into another standard than not to follow any.

Upvotes: 12

MegaIng
MegaIng

Reputation: 7886

You can take out the comments with the following:

data=re.sub("//.*?\n","",data)
data=re.sub("/\\*.*?\\*/","",data)

This should remove all comments from the data. It could cause problems if there are // or /* inside your strings

Upvotes: 0

napster
napster

Reputation: 394

I haven't used it personally but you can have a look on JSONComment python package which supports parsing a json file with comment. Use it in place of JsonParser

parser = JsonComment(json)
parsed_object = parser.loads(jsonString)

Upvotes: 2

Jean-François Fabre
Jean-François Fabre

Reputation: 140266

kind of a hack (because if there are // within the json data then it will fail) but simple enough for most cases:

import json,re

s = """{
  // this configuration file works by first loading all top-level
  // configuration items and then will load the specified environment
  // on top, this provides a layering affect. environment names can be
  // anything, and just require definition in this file. There's
  // two predefined environments, 'backtesting' and 'live', feel free
  // to add more!

  "environment": "backtesting",// "live-paper", "backtesting", "live-interactive", "live-interactive-iqfeed"

  // algorithm class selector
  "algorithm-type-name": "BasicTemplateAlgorithm",

  // Algorithm language selector - options CSharp, FSharp, VisualBasic, Python, Java
  "algorithm-language": "CSharp"
}
"""

result = json.loads(re.sub("//.*","",s,flags=re.MULTILINE))

print(result)

gives:

{'environment': 'backtesting', 'algorithm-type-name': 'BasicTemplateAlgorithm', 'algorithm-language': 'CSharp'}

apply regular expression to all the lines, removing double slashes and all that follows.

Maybe a state machine parsing the line would be better to make sure the // aren't in quotes, but that's slightly more complex (but doable)

Upvotes: 7

Related Questions