DotnetProg
DotnetProg

Reputation: 810

Read multiple concatenated json objects in Ruby

I have a file that contains multiple JSON objects that are not separated by comma :

{
  "field" : "value",
  "another_field": "another_value"
} // no comma
{ 
  "field" : "value"
}

Each of the objects standalone is a valid json object.

Is there a way that I can process this file easily?

  1. I know this is NOT a valid json, but unfortunately this file is being generated by a 3rd party tool. I have no option of changing the way the output looks like.
  2. I can't open a text editor and smart-insert commas / square brackets before the run, since this is an automated process (I also really don't want to write code that opens the file and manipulates it).

In .NET there's a library that has this exact feature : https://stackoverflow.com/a/29480032/2970729 https://www.newtonsoft.com/json/help/html/P_Newtonsoft_Json_JsonReader_SupportMultipleContent.htm

Is there anything equivalent in Ruby?

Upvotes: 1

Views: 987

Answers (3)

ICR
ICR

Reputation: 14162

If you know the data will be valid JSON documents, you can use this method to split the string up into documents, and then parse each document.

def split_documents(str)
  res = []
  depth = 0
  start = 0
  str.scan(/([{}]|"(?:\\"|[^"])*")/) do |match|
    if match[0] == '{'
      depth += 1
    elsif match[0] == '}'
      depth -= 1
      if depth == 0
        match_start = Regexp.last_match.begin(0)
        res << str[start..match_start]
        start = match_start + 1
      end
    end
  end
  res
end

This scans the string for {, }, or strings. Each time it hits a {, it increases the depth by 1. Each time it hits a }, is decreases the depth by 1. Every time the depth hits zero, you know you have reached the end of a document because you have balanced braces. The regex has to also match strings so that it doesn't accidentally count braces inside of strings e.g. { "foo": "ba}r" }.

Upvotes: 0

Steve
Steve

Reputation: 7098

The yajl-ruby gem enables processing concatenated JSON in Ruby. The parser can read from a String or an IO. Each complete object is yielded to a block.

require 'yajl'

File.open 'file.json' do |f|
  Yajl.load f do |object|
    # do something with object
  end
end

See the documentation for other options (buffer size, symbolized keys, etc).

Upvotes: 0

spickermann
spickermann

Reputation: 106792

As long as your file is that simple you might want to do something like this:

# content = File.read(filename)
content =<<-EOF
  {
    "field" : "value",
    "another_field": "another_value"
  } // no comma
  { 
    "field" : "value"
  }
EOF

require 'json'

JSON.parse("[#{content.gsub(/\}.*?\{/m, '},{')}]")
#=> [{"field"=>"value", "another_field"=>"another_value"}, {"field"=>"value"}]

Upvotes: 1

Related Questions