Reputation: 743
I am playing with golang yaml v3 library. The goal is to parse any yaml (that means that I don't have predefined structure) from file with comments, be able to set or unset any value in the resulting tree and write it back to file.
However, I have encountered quite strange behavior. As you can see in the code below, if the main type passed to the Unmarshal function is interface{}
, no comments are preserved and library uses maps and slices to represent the structure of yaml. On the other hand, if I use (in this case) []yaml.Node
structure, it does represent all nodes internally as yaml.Node
or []yaml.Node
. This is more or less what I want, because it allows comment preservation. However, it is not a general solution because there are at least two distinct scenarios - either the YAML starts with an array or with a map and I am not sure how to elegantly deal with both situations.
Could you possibly point me in the right direction and elaborate on why does the library behaves this way?
package main
import (
"fmt"
"reflect"
"gopkg.in/yaml.v3"
)
type Document interface{} // change this to []yaml.Node and it will work with comments // change it to yaml.Node and it will not work
var data string = ` # Employee records
- martin:
name: Martin D'vloper
job: Developer
skills:
- python
- perl
- pascal
- tabitha:
name: Tabitha Bitumen
job: Developer
skills:
- lisp
- fortran
- erlang
`
func toSlice(slice interface{}) []interface{} {
s := reflect.ValueOf(slice)
if s.Kind() != reflect.Slice {
panic("InterfaceSlice() given a non-slice type")
}
ret := make([]interface{}, s.Len())
for i:=0; i<s.Len(); i++ {
ret[i] = s.Index(i).Interface()
}
return ret
}
func main() {
var d Document
err := yaml.Unmarshal([]byte(data), &d)
if err != nil {
panic(err)
}
slice := toSlice(d)
fmt.Println(reflect.ValueOf(slice[0]).Kind())
fmt.Println(reflect.TypeOf(d))
fmt.Println(reflect.ValueOf(d).Kind())
output, err := yaml.Marshal(&d)
if err != nil {
panic(err)
}
fmt.Println(string(output))
}
Upvotes: 7
Views: 9722
Reputation: 2918
When go-yaml parses a YAML document, it always creates a YAML node tree first. Whether or not it then converts that node tree into a plain Golang object, depends on the type of the out
argument passed to Unmarshall
. Here's a code snippet from go-yaml sources:
func (d *decoder) unmarshal(n *Node, out reflect.Value) (good bool) {
// ...
if out.Type() == nodeType {
out.Set(reflect.ValueOf(n).Elem())
return true
}
// ...
}
Essentially, go-yaml skips the conversion of the node tree if the supplied argument is a pointer to yaml.Node
. When your parameter type is interface{}
or anything else other than yaml.Node
, it will do the conversion.
In order to preserve comments and allow a map, an array, or even a single value at the top level, just pass a *yaml.Node
as a second parameter to yaml.Unmarshal
:
var n yaml.Node
err := yaml.Unmarshal(bytes, &n)
In case of an array being at the top level, the root node will contain the YAML nodes of array elements as its children.
Upvotes: 1
Reputation: 39708
On the other hand, if I use (in this case) []yaml.Node structure, it does represent all nodes internally as yaml.Node or []yaml.Node.
That is not accurate. go-yaml lets you leave any sub-tree of your structure as yaml.Node
possibly for later processing. Inside this node, everything is represented as a yaml.Node
, and a node that is a collection (sequence or mapping) just happens to store its children as []yaml.Node
. But no node is directly represented as []yaml.Node
.
When you deserialize into []yaml.Node
, you deserialize the top-level node into a native structure (a slice) while leaving the children unconstructed (the process of loading a YAML node into a native structure is called construction in the spec).
go-yaml does not really support
type Document yaml.Node
but if you just do
var d yaml.Node
the comment will be preserved as well (toSlice
will not work anymore obviously):
- # Employee records
martin:
name: Martin D'vloper
job: Developer
skills:
- python
- perl
- pascal
- tabitha:
name: Tabitha Bitumen
job: Developer
skills:
- lisp
- fortran
- erlang
Now as we can see, the position of the comment differs. This is because go-yaml just stores in the yaml.Node
that represents the list item that „there has been a comment before this list item“. The information about where exactly the comment has been located was lost. You should be thankful that you have any information about the comment because most YAML implementations scrap them far earlier since the spec says that comments must not convey content information.
You may want to read I want to load a YAML file, possibly edit the data, and then dump it again. How can I preserve formatting? which goes into detail about why, when and how information is lost during loading of a YAML file. TL;DR: It is impossible (without basically doing parsing yourself) to load a YAML file and dump it back while preserving all formatting and if that is your goal, YAML is the wrong tool for you.
Upvotes: 6