Reputation: 77
I have about 500 json files with comments in them. Trying to update a field on the json file with a new value, throws an error. I managed to use commentjson to remove strings like this // some text and the json file updates and throws no errors.
But there is about 100 json files with comments like this:
/*
1. sometext.
i. sometext
ii. sometext
2. sometext
*/
Commentjson just crashes when /* exists. If I remove /* and run the code, it will work and update and remove any //. How can I write some code to manage /* and all text between /* */?
This is my current code that can remove //
with open(f"{i['Location']}\\{file_name}",'r') as f:
json_info = commentjson.load(f) #Gets info from the json file
json_info['password'] = password
with open(f"{i['location_Daily']}\\{file_name}",'w') as f:
commentjson.dump(json_info,f,indent = 4) #updates the password
print("updated")
Upvotes: 2
Views: 1223
Reputation: 1123520
You have a few options:
Read the whole file into a string, then use a regular expression to pre-process the text. E.g.:
with open(...) as f:
json_text = f.read()
# remove everything from '/*' to '*/' as long as it is either
# - a '*' character that is *not* followed by '/'
# - any character that is not '*'
without_comments = re.sub(r"/\*(?:\*(?!/)|[^*])*\*/", "", json_text)
json_info = commentjson.loads(without_comments)
Note that this approach is not going to work if there are also JSON strings with the /*
and */
inside of them. A regex is not a JSON parser.
try to update the parser that the commonjson
project uses to parse out JSON. Looking at the project source code, they use the Lark parsing library, so you could monkey patch the module with additional grammar.
I note that the main branch already has a grammar rule defining multi-line comments:
COMMENT: "/*" /(.|\\n)+?/ "*/"
| /(#|\\/\\/)[^\\n]*/
but that is not yet part of their release. You can, however, re-use that rule:
from commentjson import commentjson as implementation
from lark.reconstruct import Reconstructor
serialized = implementation.parser.serialize()
for tok in serialized["parser"]["lexer_conf"]["tokens"]:
if tok["name"] != "COMMENT":
continue
if tok["pattern"]["value"].startswith("(#|"):
# only supports `#` or `//` comments, add block comments
tok["pattern"]["value"] = r'(?:/\*(?:\*(?!/)|[^*])*\*/|(#|\/\/)[^\n]*)'
break
implementation.parser = implementation.parser.deserialize(serialized, None, None)
I used my own regex in that grammar update rather than the version used by the project.
Find a different library to parse the input. There are several options that claim to support parsing JSON with the same syntax:
I have not tried any of these nor have anything to say about their usability or performance.
Upvotes: 2
Reputation: 10709
You can use another library such as json5 or pyjson5 or anything that supports JSON5
import json5
import pyjson5
data = '''
{
"something": [
["any"],
["thing", "here", 10] // This is comment 1
],
/* While this
is
comment 2 */
"car": [
["and", "another", "here"], /* Last comment */
]
}
'''
print(json5.loads(data))
print(pyjson5.loads(data))
Output
$ python3 script.py
{'something': [['any'], ['thing', 'here', 10]], 'car': [['and', 'another', 'here']]}
{'something': [['any'], ['thing', 'here', 10]], 'car': [['and', 'another', 'here']]}
Upvotes: 6