fex
fex

Reputation: 307

Ruby mongoDB and large documents

I have a populated mongoDB.

Now I need to add huge amounts of additional data to my documents (log file data). This data exceeds the BSON size limit.

Document too large: This BSON document is limited to 16777216 bytes. (BSON::InvalidDocument)

A simplified example of my situation would look like this:

cli = MongoClient.new("localhost", MongoClient::DEFAULT_PORT)
db = cli.db("testdb")
coll = db.collection("test")

data = {:name => "Customer1", :data1 => "some value", :log_file => "A" * 17_000_000}

coll.save data
  1. What is the best way to add this huge amount of data?
  2. Could I use GridFS to store those files and link the GridFS-file-handle to the correct document?
    1. Could I then access the GridFS-file during queries?

Upvotes: 0

Views: 220

Answers (3)

fex
fex

Reputation: 307

The paragraph about document growth finally solved my question. (Found by following Konrad's link.)

http://docs.mongodb.org/manual/core/data-model-operations/#data-model-document-growth

What I am now basically doing is this:

cli = MongoClient.new("localhost", MongoClient::DEFAULT_PORT)
db = cli.db("testdb")
coll = db.collection("test")
grid = Grid.new db

#store data
id = grid.put "A"*17_000_000
data = {:name => "Customer1", :data1 => "some value", :log_file => id}
coll.save data

#access data
cust = coll.find({:name => "Customer1"})
id = cust.first["log_file"]
data = grid.get id

Upvotes: 1

Konrad Kleine
Konrad Kleine

Reputation: 4479

Maybe you can split up your document and reference them. See this SO post: syntax for linking documents in mongodb

Upvotes: 1

Stefan Dorunga
Stefan Dorunga

Reputation: 679

I would suggest two approaches:

GridFS with instructions here https://github.com/mongodb/mongo-ruby-driver/wiki/GridFS

  • Advantages: uses already existing service(mongodb) to store files so presumably easiest to implement/ cheapest since you already have the infrastructure.

  • Disadvantage: Not necesarilly the best use of an in-memory DB, especially if it's used for other storage as well.

S3 - Store links to a hosted data service (such as Amazon S3) which is designed for file storage (redundant, replicated and highly available). In this case you just upload the files and store a pointer to their S3 location in your DB.

  • Advantage Keeps your DB leaner, probably cheaper since you keep your mongo machines optimised for doing mongo things (i.e. high-memory) and take advantage of the really cheap file storage on S3 as well as the near-infinite scalability.

  • Disadvantage Harder to implement since you need to design your own code to do this. Though there may be off the shelf solutions somewhere.

Some more useful discussion on this SO post

Upvotes: 1

Related Questions