Reputation: 1105
I'm a little unsure exactly where to point the finger (other than at myself of course)
JSON is a subset of YAML 1.2 http://www.yaml.org/spec/1.2/spec.html "every JSON file is also a valid YAML file"
JSON can have tabs as 'insignificant whitespace' - including tabs http://www.ietf.org/rfc/rfc4627.txt "Insignificant whitespace is allowed ..."
YAML does not allow tabs for indentation http://www.yaml.org/spec/1.2/spec.html "tab characters must not be used in indentation"
So using my YAML parser to process the below JSON
{
\t"result" : "success",
}
NOTE: the \t is just to visualize, the input contains a real tab character.
Hits an error 'not allowed to use tab for indenting' <- which seems correct.
But then how does the "every JSON file is also a valid YAML file" rule hold; as my file is valid JSON?
As the tab character is meaningless should I just run a pre-processing step to strip out all tabs? As the only whitespace that is allowed in strings is 'space'- it should be safe to just strip out all tabs in the file.
Upvotes: 3
Views: 3527
Reputation: 76632
The JSON compatibility has only been added in version 1.2 of the YAML specification. The implementation of such compatibility on top of a parser originally designed for YAML 1.1 is not trivial.
The tab character has no fixed representation in spaces and when editing depends on the settings (or default) of your editor. In practise it means that you should not use tab characters at all in block style mode, and most parsers don't allow them in flow-style mode either.
So this should be accepted by your parser, as it is done by ruamel.yaml>=0.17.24
(when using pure Python), but if it doesn't you could filter it out, but only at the beginning of lines and if you know TAB is not used in literal- or flow-style scalars.
If the JSON automatically generated, adapt the generator to use space(s).
Upvotes: 2
Reputation: 4569
For what it's worth, we always use TABs for indentation - IMHO, it's the only logical choice. So, using YAML was a real problem. Not wanting to modify existing YAML parsers (bad thing to do), I wrote the following JavaScript function to "untabify" strings, the result of which can be fed to a YAML parser:
function untabify(str, indent=' ') {
return str.replace(/^(\t*)(\x20*)/gm, function(match, p1, p2) {
// --- Keep track of line numbers, for error messages
if (untabify.hasOwnProperty('lineNumber')) {
++untabify.lineNumber;
}
else {
untabify.lineNumber = 0;
}
// --- It's an error for space characters to appear in indentation
if (p2.length > 0) {
throw "Space character not allowed in indentation on line "
+ untabify.lineNumber;
}
return indent.repeat(p1.length);
});
} // untabify()
With it, you can simply do this (aside from removing the code that keeps track of the line number, I'm not sure how I can improve on it):
var str = `
---
- StudentName
- StudentCode
- Age
- DateOfBirth
- Gender
-
- lCustomData
- Name
- Value
-
- lRaces
- bPrimaryRace
- Race
- RaceCode
-
- lGoals
- iGoal
- Goal
- BeginDate
- EndDate
-
- lObjectives
- iObjective
- Objective
`;
var lStudentFields = YAML.parse(untabify(str));
It will throw an error if you try to mix TABs and spaces in your indentation. Also, note that line numbers start at 0, but since my usual usage is as in the example, with the backtick string starting where it does, that first line number 0 will be the empty line that starts there, so the '---' in the example is, in fact, line number 1. Not highly tested, but pretty simple and clear - use at your own risk.
Upvotes: 0
Reputation: 39708
Hits an error 'not allowed to use tab for indenting' <- which seems correct.
It is not.
This is the relevant production in the Spec:
[140] c-flow-mapping(n,c) ::= “{” s-separate(n,c)?
ns-s-flow-map-entries(n,in-flow(c))? “}”
s-separate(n,c)
resolves to s-separate-lines(n)
here (because we are not inside block-key
or flow-key
). Skipping some steps, we reach s-separate-in-line which allows tab characters.
The bottom line is that this tab character in your JSON is not indentation. Indentation is only relevant in block style (i.e. not using [
or {
for sequences and mappings respectively). In Flow style, whitespace is only for separation.
Edit: Removed example link because it was somewhat misleading.
Edit 2: To answer your second question: No, do not strip tabs. They may be content inside scalars! See this example where a tabular actually determines the indentation of a block scalar.
Upvotes: 3