Kimmo Lehto
Kimmo Lehto

Reputation: 6041

How to safe load a YAML file that includes multiple documents?

The regular way to safe load a typical single document YAML file is done by using YAML.safe_load(content).

YAML files can contain multiple documents:

---
key: value
---
key: !ruby/struct
  foo: bar

Loading a YAML file such as this using YAML.safe_load(content) will only return the first document:

{ 'key' => 'value' }

If you split the file and try to safe_load the second document, you will get an exception as expected:

Psych::DisallowedClass (Tried to load unspecified class: Struct)

To load multiple documents you can use YAML.load_stream(content) which returns an array:

[
  { 'key' => 'value' },
  { 'key' => #<struct foo="bar"> }
]

The problem is that there is no YAML.safe_load_stream that would raise exceptions for non-whitelisted data types.

Upvotes: 6

Views: 2348

Answers (1)

Kimmo Lehto
Kimmo Lehto

Reputation: 6041

I wrote a workaround that utilizes the YAML.parse_stream interface:

Edit: Now as gem yaml-safe_load_stream. Also, the maintainers of Psych (the YAML in ruby stdlib) are looking into adding this feature to the library.

require 'yaml'

module YAML
  def safe_load_stream(yaml, filename = nil, &block)
    parse_stream(yaml, filename) do |stream|
      raise_if_tags(stream, filename)
      if block_given?
        yield stream.to_ruby
      else
        stream.to_ruby
      end
    end
  end
  module_function :safe_load_stream

  def raise_if_tags(obj, filename = nil, doc_num = 1)
    doc_num += 1 if obj.is_a?(Psych::Nodes::Document)

    if obj.respond_to?(:tag)
      if tag = obj.tag
        message = "tag #{tag} encountered on line #{obj.start_line} column #{obj.start_column} of document #{doc_num}"
        message << " in file #{filename}" if filename
        raise Psych::DisallowedClass, message
      end
    end

    if obj.respond_to?(:children)
      Array(obj.children).each do |child|
        raise_if_tags(child, filename, doc_num)
      end
    end
  end
  module_function :raise_if_tags
  private_class_method :raise_if_tags
end

With this you can do:

YAML.safe_load_stream(content, 'file.txt')

And get an exception:

Psych::DisallowedClass (Tried to load unspecified class: tag !ruby/struct
encountered on line 1 column 7 of document 2 in file file.txt)

The line numbers returned from .start_line are relative to the document start, I didn't find a way to get the line number where the document starts, so I added the document number to the error message.

It does not have the class and symbol whitelists and toggling of anchors/aliasing like the YAML.safe_load.

Also there are ways to use tags that will probably give a false positive with such a simplistic unless tag.nil? detection.

Upvotes: 3

Related Questions