Alex Amato
Alex Amato

Reputation: 1725

Representing a A JSON String using Yaml folded > string results in adding unexpected newline characters

Representing a A JSON String using YAML folded > string results in adding unexpected newline characters. Is it possible to represent the example string without introducing newlines after each line in the folded string?

- yaml: >-
    {
      "This" : "is supposed to be a JSON string.",
      "it" : "is not meant to be a yaml map itself.",
      "And": "if this string were passed into"
      "a": "json parser, it would THEN parse as a JSON map."
      "The": "confusing part to me is the newlines."
      "Normally": ">- folded style strings do not introduce"
      "newline": "characters between lines within the yaml"
      "folded": "There shouldn't be any newlines added."
      "But": "several unexpected newlines are introduced."
      "Does": "This have something to do with the special"
      "characters": "in the string?"
    }

Here is the resulting string, which when parsed introduced a newline at the end of each line in the folded >- style string.

[ { "yaml": "{\n "This" : "is supposed to be a JSON string.",\n "it" : "is not meant to be a yaml map itself.",\n "And": "if this string were passed into"\n "a": "json parser, it would THEN parse as a JSON map."\n "The": "confusing part to me is the newlines."\n "Normally": ">- folded style strings do not introduce"\n "newline": "characters between lines within the yaml"\n "folded": "There shouldn't be any newlines added."\n "But": "several unexpected newlines are introduced."\n "Does": "This have something to do with the special"\n "characters": "in the string?"\n}" } ]

Based on the specs I have been reading, this should not happen. And it does not happen for other folded strings, but does for this example, perhaps it has something to do with the special characters.

Though I have seen it do this in at least two YAML parsers (Python's pyyaml and Java's snake YAML). It also appears in this yaml parser webapp.

Either I am misunderstanding the spec, or libraries are both implementing it wrong (perhaps intentionally, in order to be compatible with each other).

Ultimately, I am asking because I want to use YAML for config files in a project. But I am concerned that I won't be able to represent multiline strings exactly as I want them (without introducing unexpected spaces, newlines, etc.)

Upvotes: 2

Views: 899

Answers (1)

flyx
flyx

Reputation: 39638

The part of the spec you linked directly addresses this:

Lines starting with white space characters (more-indented lines) are not folded.

The examples 8.10 and 8.11 in the spec show how more indented lines are not folded. In your YAML code, everything but { and } is more indented and therefore, the lines are not folded.

Background is that folded block scalars want to enable you to have stuff like a bullet list, e.g.

content: >
  foo
 
   * one
   * two

  bar

And those bullets should parse as a full line each, therefore the rule was established that more-indented lines are not folded. The rule has proven to be rather detrimental to the use-cases that emerged for YAML, like your example shows.

If you do not want this behavior, I advise to define a local tag like this:

content: !fold |
  foo
  
   * one
   * two
  
  bar

Then you can write a custom constructor for your tag that does line folding like you want it on the initially unfolded literal block scalar content.

Upvotes: 2

Related Questions