Reputation: 810
I have a file that contains multiple JSON objects that are not separated by comma :
{
"field" : "value",
"another_field": "another_value"
} // no comma
{
"field" : "value"
}
Each of the objects standalone is a valid json object.
Is there a way that I can process this file easily?
In .NET there's a library that has this exact feature : https://stackoverflow.com/a/29480032/2970729 https://www.newtonsoft.com/json/help/html/P_Newtonsoft_Json_JsonReader_SupportMultipleContent.htm
Is there anything equivalent in Ruby?
Upvotes: 1
Views: 987
Reputation: 14162
If you know the data will be valid JSON documents, you can use this method to split the string up into documents, and then parse each document.
def split_documents(str)
res = []
depth = 0
start = 0
str.scan(/([{}]|"(?:\\"|[^"])*")/) do |match|
if match[0] == '{'
depth += 1
elsif match[0] == '}'
depth -= 1
if depth == 0
match_start = Regexp.last_match.begin(0)
res << str[start..match_start]
start = match_start + 1
end
end
end
res
end
This scans the string for {
, }
, or strings. Each time it hits a {
, it increases the depth by 1. Each time it hits a }
, is decreases the depth by 1. Every time the depth hits zero, you know you have reached the end of a document because you have balanced braces. The regex has to also match strings so that it doesn't accidentally count braces inside of strings e.g. { "foo": "ba}r" }
.
Upvotes: 0
Reputation: 7098
The yajl-ruby gem enables processing concatenated JSON in Ruby. The parser can read from a String or an IO. Each complete object is yielded to a block.
require 'yajl'
File.open 'file.json' do |f|
Yajl.load f do |object|
# do something with object
end
end
See the documentation for other options (buffer size, symbolized keys, etc).
Upvotes: 0
Reputation: 106792
As long as your file is that simple you might want to do something like this:
# content = File.read(filename)
content =<<-EOF
{
"field" : "value",
"another_field": "another_value"
} // no comma
{
"field" : "value"
}
EOF
require 'json'
JSON.parse("[#{content.gsub(/\}.*?\{/m, '},{')}]")
#=> [{"field"=>"value", "another_field"=>"another_value"}, {"field"=>"value"}]
Upvotes: 1