Reputation:
I'm writing a JSON parser and I have trouble coming up with good design for the error handling. Let's say that at some point, the lexical analyser founds a token with a lexical error in it. How should it react ? Should it stop right away or continue to the end of the string ? How do parser generally handle lexical errors ?
Upvotes: 1
Views: 1137
Reputation: 241881
It depends on the purpose of the application, but in most cases a JSON parser should stop at the first error.
JSON is a data interchange format. In most applications, the input was originally created programmatically, and a syntax error indicates a corrupted communication or buggy generator. If the encoded data had been stored in a database, it could indicate corrupted storage. It might even be an indication of an attack: an attempt to modify the data en route or to hand-craft problematic data.
In such cases, the best strategy is usually to simply drop the data rather than attempting to "fix" it. Returning some kind of detailed error message is discouraged, because (a) the originating application is unlikely to be able to handle such a response, and (b) it might give an attacker additional information. Handling the incorrect data by attempting to guess what the correct representation might be could silently hide bugs in the application generating the data.
Of course, for debugging and logging purposes it is useful to be able to provide more informative error reports. Even then, it is rarely if ever useful to proceed beyond the first error.
It is possible that the application intends to use JSON as a human-editable data descriptor, for example as a configuration file. In that case, being able to find and report multiple errors may be useful, as it would be in the parser for a programming language. (But even then, it is not necessary.)
Upvotes: 2
Reputation: 3265
In the case of lexical errors, you should go on evalutating the whole string. Showing more than an error at the time could be helpful.
In case of syntax errors there are two ways:
Safe sequences are sentences language-dependent which should be valid indipendently from the context. This does not fix the error itself, but allow the parser to show up other eventual errors (it's the behaviour of compilers. In case there are multiple errors, they can detect and notify most of them).
A safe sequence in JSON could be, for example, a correct object definition. Something like this (EBNF):
\{ <key>:<value>[, <key>:<value>] \}
I hope it helps.
Upvotes: 0