Felquir
Felquir

Reputation: 441

best db to store files

All,

I need to store a large amount of files(few millions) in a database and I'm not sure which technology or database use. My first idea is use mongodb or no sql dB.

Thanks

Upvotes: 13

Views: 18181

Answers (3)

Shvedkov Ivan
Shvedkov Ivan

Reputation: 39

Working with such amount of files can decrease your system efficiency and overload database. The best way to provide file maintainance is to implement an s3-storage integration

Upvotes: 0

I need to store a large amount of files (few millions) in a database

What do that means? What exactly do you store in the database (so what is a file for you) ? On what operating system? For what file system?

(I implicitly am thinking of some Linux or Unix-like OS, because they are so common for Internet and Web servers; remember that unix files are actually i-nodes and directories associate names to i-nodes, and a file can have several file paths)

  • file paths, they are just strings (of reasonable length, quite often at most a few kilobytes) with some restrictions; BTW you might "normalize" a path (e.g. with realpath(3)) before storing it in the DB.

  • file contents, they are just "blobs", that is potentially large but arbitrary sequences of bytes. Here you have an issue of putting various-sized blobs in a DB ; a file can have gigabytes -or even terabytes- of content, a blob usually don't. Most DB systems handle blobs in full (e.g. keep it in RAM). Can you afford a limitation (e.g. to a few megabytes) of your file's size?

  • the file metadata (e.g. mtime, permission, ownership) is generally also quite small (it might for example be represented by a few short columns in some SQL table)

Then what do they be files mean? Perhaps you want to code some application which uses a database for storage and provide a file system abstraction to the OS. Then think about file systems in user space (FUSE)

BTW, a file exists independently of your DB (since files are an abstraction provided by your OS). It might be and often is created, read, written or deleted by some outside programs. On Linux, consider inotify(7) facilities to be notified of file system events (for a local ordinary file system such as ext4).

Notice that these days most DB (and RDBMS such as PostGreSQL or MySQL and non-SQL DBMS such as MongoDB) are storing their data in files (that is, using raw disk partitions for the storage of DB has become out of fashion).

Since many DBMS set some limitations on contents (e.g. a blob might be limited to a few dozen kilobytes, in a row in some table of some RDBMS), it is common to do the opposite of what your (unclear) question suggests. A typical example is keeping images in a database. Often you'll segregate between small images (e.g. less than 8Kbytes) and store them directly as blob in some table (remember that tiny files of a few bytes have some large overhead in most file systems, e.g. on my computer with ext4 a file consumes at least a kilobyte of disk space), and larger images: then you'll store them in a file system (some file path like 0123/4567/89ab.jpeg) and store its file path in some column. YMMV.

Upvotes: -2

Jason Jiménez
Jason Jiménez

Reputation: 369

If you want to make some queries or search by keywords into file contents, I'll recommend you ElasticSearch, also you can use Apache Solr.

If you need something more specific I would need more information

Upvotes: 3

Related Questions