user2800390
user2800390

Reputation: 83

In Ruby, how can I recursivly populate a Mongo database using nested arrays as input?

I have been using Ruby for a while, but this is my first time doing anything with a database. I've been playing around with MongoDB for a bit and, at this point, I've begun to try and populate a simple database.

Here is my problem. I have a text file containing data in a particular format. When I read that file in, the data is stored in nested arrays like so:

dataFile = ["sectionName", ["key1", "value1"], ["key2", "value2", ["key3", ["value3A", "value3B"]]]

The format will always be that the first value of the array is a string and each subsequent value is an array. Each array is formatted in as a key/value pair. However, the value can be a string, an array of two strings, or a series of arrays that have their own key/value array pairs. I don't know any details about the data file before I read it in, just that it conforms to these rules.

Now, here is my problem. I want to read this into to a Mongo database preserving this basic structure. So, for instance, if I were to do this by hand, it would look like this:

newDB = mongo_client.db("newDB")
newCollection = newDB["dataFile1"]
doc = {"section_name" => "sectionName", "key1" => "value1", "key2" => "value2", "key3" => ["value3A", "value3B"]}
ID = newCollection.insert(doc)

I know there has to be an easy way to do this. So far, I've been trying various recursive functions to parse the data out, turn it into mongo commands and try to populate my database. But it just feels clunky, like there is a better way. Any insight into this problem would be appreciated.

Upvotes: 1

Views: 267

Answers (2)

Gary Murakami
Gary Murakami

Reputation: 3402

In the following test, please find two solutions. The first converts to a nested Hash which is what I think that you want without flattening the input data. The second stores the key-value pairs exactly as given from the input. I've chosen to fix missing closing square bracket by preserving key values pairs.

The major message here is that while the top-level data structure for MongoDB is a document mapped to a Ruby Hash that by definition has key-value structure, the values can be any shape including nested arrays or hashes. So I hope that test examples cover the range, showing that you can match storage in MongoDB to fit your needs.

test.rb

require 'mongo'
require 'test/unit'
require 'pp'

class MyTest < Test::Unit::TestCase
  def setup
    @coll = Mongo::MongoClient.new['test']['test']
    @coll.remove
    @dataFile = ["sectionName", ["key1", "value1"], ["key2", "value2"], ["key3", ["value3A", "value3B"]]]
    @key, *@value = @dataFile
  end

  test "nested array data as hash value" do
    input_doc = {@key => Hash[*@value.flatten(1)]}
    @coll.insert(input_doc)
    fetched_doc = @coll.find.first
    assert_equal(input_doc[@key], fetched_doc[@key])
    puts "#{name} fetched hash value doc:"
    pp fetched_doc
  end

  test "nested array data as array value" do
    input_doc = {@key => @value}
    @coll.insert(input_doc)
    fetched_doc = @coll.find.first
    assert_equal(input_doc[@key], fetched_doc[@key])
    puts "#{name} fetched array doc:"
    pp fetched_doc
  end
end

ruby test.rb

$ ruby test.rb
Loaded suite test
Started
test: nested array data as array value(MyTest) fetched array doc:
{"_id"=>BSON::ObjectId('5357d4ac7f11ba0678000001'),
 "sectionName"=>
  [["key1", "value1"], ["key2", "value2"], ["key3", ["value3A", "value3B"]]]}
.test: nested array data as hash value(MyTest) fetched hash value doc:
{"_id"=>BSON::ObjectId('5357d4ac7f11ba0678000002'),
 "sectionName"=>
  {"key1"=>"value1", "key2"=>"value2", "key3"=>["value3A", "value3B"]}}
.

Finished in 0.009493 seconds.

2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed

210.68 tests/s, 210.68 assertions/s

Upvotes: 0

gautamc
gautamc

Reputation: 423

The value that you gave for the variable dataFile isn't a valid array, because it is missing an closing square bracket.

If we made the definition of dataFile a valid line of ruby code, the following code would yield the hash that you described. It uses map.with_index to visit each element of the array and transforms this array into a new array of key/value hashes. This transformed array of hashes is flatted and converted into single hash using the inject method.

dataFile = ["sectionName", ["key1", "value1"], ["key2", "value2", ["key3", ["value3A", "value3B"]]]]
puts dataFile.map.with_index {
  |e, ix|
  case ix
  when 0
    { "section_name" => e }
  else
    list = []
    list.push( { e[0] => e[1] } )
    if( e.length > 2 )
      list.push(
        e[2..e.length-1].map {|p|
          { p[0] => p[1] }
        }
      )
    end
    list
  end
}.flatten.inject({ }) {
  |accum, e|
  key = e.keys.first
  accum[ key ] = e[ key ]
  accum
}.inspect

The output looks like:

{"section_name"=>"sectionName", "key1"=>"value1", "key2"=>"value2", "key3"=>["value3A", "value3B"]}

For input that looked like this:

["sectionName", ["key1", "value1"], ["key2", "value2", ["key3", ["value3A", "value3B"]], ["key4", ["value4A", "value4B"]]], ["key5", ["value5A", "value5B"]]]

We would see:

{"section_name"=>"sectionName", "key1"=>"value1", "key2"=>"value2", "key3"=>["value3A", "value3B"], "key4"=>["value4A", "value4B"], "key5"=>["value5A", "value5B"]}

Note the arrays "key3" and "key4", which is what I consider as being called a series of arrays. If the structure has array of arrays of unknown depth then we would need a different implementation - maybe use an array to keep track of the position as the program walks through this arbitrarily nested array of arrays.

Upvotes: 1

Related Questions