Reputation: 6041
The regular way to safe load a typical single document YAML file is done by using YAML.safe_load(content)
.
YAML files can contain multiple documents:
---
key: value
---
key: !ruby/struct
foo: bar
Loading a YAML file such as this using YAML.safe_load(content)
will only return the first document:
{ 'key' => 'value' }
If you split the file and try to safe_load the second document, you will get an exception as expected:
Psych::DisallowedClass (Tried to load unspecified class: Struct)
To load multiple documents you can use YAML.load_stream(content)
which returns an array:
[
{ 'key' => 'value' },
{ 'key' => #<struct foo="bar"> }
]
The problem is that there is no YAML.safe_load_stream
that would raise exceptions for non-whitelisted data types.
Upvotes: 6
Views: 2348
Reputation: 6041
I wrote a workaround that utilizes the YAML.parse_stream
interface:
Edit: Now as gem yaml-safe_load_stream. Also, the maintainers of Psych (the YAML
in ruby stdlib) are looking into adding this feature to the library.
require 'yaml'
module YAML
def safe_load_stream(yaml, filename = nil, &block)
parse_stream(yaml, filename) do |stream|
raise_if_tags(stream, filename)
if block_given?
yield stream.to_ruby
else
stream.to_ruby
end
end
end
module_function :safe_load_stream
def raise_if_tags(obj, filename = nil, doc_num = 1)
doc_num += 1 if obj.is_a?(Psych::Nodes::Document)
if obj.respond_to?(:tag)
if tag = obj.tag
message = "tag #{tag} encountered on line #{obj.start_line} column #{obj.start_column} of document #{doc_num}"
message << " in file #{filename}" if filename
raise Psych::DisallowedClass, message
end
end
if obj.respond_to?(:children)
Array(obj.children).each do |child|
raise_if_tags(child, filename, doc_num)
end
end
end
module_function :raise_if_tags
private_class_method :raise_if_tags
end
With this you can do:
YAML.safe_load_stream(content, 'file.txt')
And get an exception:
Psych::DisallowedClass (Tried to load unspecified class: tag !ruby/struct
encountered on line 1 column 7 of document 2 in file file.txt)
The line numbers returned from .start_line
are relative to the document start, I didn't find a way to get the line number where the document starts, so I added the document number to the error message.
It does not have the class and symbol whitelists and toggling of anchors/aliasing like the YAML.safe_load
.
Also there are ways to use tags that will probably give a false positive with such a simplistic unless tag.nil?
detection.
Upvotes: 3