Doumor
Doumor

Reputation: 529

Is it efficient to store images inside MongoDB using GridFS?

I know how to do it, but I wonder if it's effective. As I know MongoDB has very efficient clusters and I can flexibly control the collections and the servers they reside on. The only problem is the size of the files and the speed of accessing them through MongoDB.

Should I explore something like Apache Hadoop or if I intelligently cluster MongoDB, will I get similar access speed results?

Upvotes: 3

Views: 5153

Answers (3)

Doumor
Doumor

Reputation: 529

Anyway, I did a little investigating. The short conclusion is: if you need to store user avatars you can use MongoDB, but only if it's a single avatar (You can't store many blobs inside MongoDB) and if you need to store videos or just many and heavy files, then you need something like CephFS.

Why do I think so? The thing is, when I was testing with MongoDB and media files on a slow instance, files weighing up to 10mb(Usually about 1 megabyte) were coming back at up to 3000 milliseconds. That's an unacceptably long time. When there were a lot of files (100+), it could turn into a pain. A real pain.

Ceph is designed just for storing files. To store petabytes of information. That's what's needed.

How do you implement this in a real project? If you use the OOP implementation of MongoDB(Mongoose), you can just add methods to the database objects that access Ceph and do what you need. You can make methods "load file", "delete file", "count quantity" and so on, and then just use it all together as usual. Don't forget to maintain Ceph, add servers as needed, and everything will work perfectly. The files themselves should be accessed only through your web server, not directly, i.e. the web server should throw a request to Ceph when the user needs to give the file and return the response from Ceph to the user.

I hope I helped more than just myself. I'll go add Ceph to my tags. Good luck!

GridFS

Ceph File System

More Ceph

Upvotes: 2

D. SM
D. SM

Reputation: 14510

GridFS is provided for convenience, it is not designed to be the ultimate binary blob storage platform.

MongoDB imposes a limit of 16 MB on each document it stores. This is unlike, for example, many relational databases which permit much larger values to be stored.

Since many applications deal with large binary blobs, MongoDB's solution to this problem is GridFS, which roughly works like this:

  • For each blob to be inserted, a metadata document is inserted into the metadata collection.
  • Then, the actual blob is split into 16 MB chunks and uploaded as a sequence of documents into the blob collection.
  • MongoDB drivers provide helpers for writing and reading the blobs and the metadata.

Thus, on first glance, the problem is solved - the application can store arbitrarily large blobs in a straightforward manner. However, digging deeper, GridFS has the following issues/limitations:

  • On the server side, documents storing blob chunks aren't stored separately from other documents. As such they compete for cache space with the actual documents. A database which has both content documents and blobs is likely to perform worse than a database that has only content documents.
  • At the same time, since the blob chunks are stored in the same way as content documents, storing them is generally expensive. For example, S3 is much cheaper than EBS storage, and GridFS would put all data on EBS.
  • To my knowledge there is no support for parallel writes or parallel reads of the blobs (writing/reading several chunks of the same blob at a time). This can in principle be implemented, either in MongoDB drivers or in an application, but as far as I know this isn't provided out of the box by any driver. This limits I/O performance when the blobs are large.
  • Similarly, if a read or write fails, the entire blob must be re-read or re-written as opposed to just the missing fragment.

Despite these issues, GridFS may be a fine solution for many use cases:

  • If the overall data size isn't very large, the negative cache effects are limited.
  • If most of the blobs fit in a single document, their storage should be quite efficient.
  • The blobs are backed up and otherwise transfered together with the content documents in the database, improving data consistency and reducing the risk of data loss/inconsistencies.

Upvotes: 4

NeNaD
NeNaD

Reputation: 20334

The good practice is to upload image somewhere (your server or cloud), and then only store image url in MongoDB.

Upvotes: 2

Related Questions