RaYell
RaYell

Reputation: 70494

How to efficiently store hundrets of thousands of documents?

I'm working on a system that will need to store a lot of documents (PDFs, Word files etc.) I'm using Solr/Lucene to search for revelant information extracted from those documents but I also need a place to store the original files so that they can be opened/downloaded by the users.

I was thinking about several possibilities:

The storage I'm looking for should be:

Can you recommend what's the best way of storing those files will be in your opinion?

Upvotes: 3

Views: 2208

Answers (4)

Piotr Czapla
Piotr Czapla

Reputation: 26582

You can follow Facebook as it stores a lot of files (15 billion photos):

  • They Initially started with NFS share served by commercial storage appliances.
  • Then they moved to their onw implementation http file server called Haystack

Here is a facebook note if you want to learn more http://www.facebook.com/note.php?note_id=76191543919

Regarding the NFS share. Keep in mind that NFS shares usually limits amount of files in one folder for performance reasons. (This could be a bit counter intuitive if you assume that all recent file systems use b-trees to store their structure.) So if you are using comercial NFS shares like (NetApp) you will likely need to keep files in multiple folders.

You can do that if you have any kind of id for your files. Just divide it Ascii representation in to groups of few characters and make folder for each group. For example we use integers for ids so file with id 1234567891 is stored as storage/0012/3456/7891.

Hope that helps.

Upvotes: 1

Chathuranga Chandrasekara
Chathuranga Chandrasekara

Reputation: 20956

File System : While thinking about the big picture, The DBMS use the file system again. And the File system is dedicated for keeping the files, so you can see the optimizations (as LukeH mentioned)

Upvotes: 0

Mark Redman
Mark Redman

Reputation: 24545

In my opinion...

I would store files compressed onto disk (file system) and use a database to keep track of them.

and posibly use Sqlite if this is its only job.

Upvotes: 0

LukeH
LukeH

Reputation: 269658

A filesystem -- as the name suggests -- is designed and optimised to store large numbers of files in an efficient and scalable way.

Upvotes: 5

Related Questions