Sriram
Sriram

Reputation: 1738

Database vs File system storage

Database ultimately stores the data in files, whereas File system also stores the data in files. In this case what is the difference between DB and File System. Is it in the way it is retrieved or anything else?

Upvotes: 152

Views: 184937

Answers (7)

MQuiggGeorgia
MQuiggGeorgia

Reputation: 769

Great answers here, just wanted to add a few points. One, consider caching, which can significantly affect performance. Two, test for your particular application. Maybe test database, file system, hybrid. Test test test. Only way to really know what will work best because there are so many variables. Three, Engineering 101: keep it as simple as possible. As a system's complexity grows, the probability of problems arising increases exponentially.

Upvotes: 1

Tahir Alvi
Tahir Alvi

Reputation: 994

The main differences between database and file system storage are:

  1. A database is a software application used to insert, update, and delete data, while a file system is software used to add, update, and delete files.
  2. Saving and retrieving files is simpler in a file system, whereas using a database requires learning SQL to perform queries such as SELECT, INSERT, and UPDATE.
  3. Databases provide a proper data recovery process, whereas file systems do not.
  4. In terms of security, databases are generally more secure than file systems.
  5. Migration in a file system is very easy, as it involves just copying and pasting into the target location, whereas this task is more complex for databases.

Upvotes: 2

Vicky
Vicky

Reputation: 2107

A database is generally used for storing related, structured data, with well defined data formats, in an efficient manner for insert, update and/or retrieval (depending on application).

On the other hand, a file system is a more unstructured data store for storing arbitrary, probably unrelated data. The file system is more general, and databases are built on top of the general data storage services provided by file systems. [Quora]

And from https://dba.stackexchange.com/a/23125/

The file system is useful if you are looking for a particular file, as operating systems maintain a sort of index. However, the contents of a txt file won't be indexed, which is one of the main advantages of a database.

For very complex operations, the filesystem is likely to be very slow.

Main RDBMS advantages:

  • Tables are related to each other

  • SQL query/data processing language

  • Transaction processing addition to SQL (Transact-SQL)

  • Server-client implementation with server-side objects like stored procedures, functions, triggers, views, etc.

Advantage of the File System over Data base Management System is:

When handling small data sets with arbitrary, probably unrelated data, file is more efficient than database. For simple operations, read, write, file operations are faster and simple.

You can find n number of difference over internet.

Upvotes: 98

dkellner
dkellner

Reputation: 9986

"They're the same"

Yes, storing data is just storing data. At the end of the day, you have files. You can store lots of stuff in lots of files & folders, there are situations where this will be the way. There is a well-known versioning solution (svn) that finally ended up using a filesystem-based model to store data, ditching their BerkeleyDB. Rare but happens. More info.

"They're quite different"

In a database, you have options you don't have with files. Imagine a textfile (something like tsv/csv) with 99999 rows. Now try to:

  • Insert a column. It's painful, you have to alter each row and read+write the whole file.
  • Find a row. You either scan the whole file or build an index yourself.
  • Delete a row. Find row, then read+write everything after it.
  • Reorder columns. Again, full read+write.
  • Sort rows. Full read, some kind of sort - then do it next time all over.

There are lots of other good points but these are the first mountains you're trying to climb when you think of a file based db alternative. Those guys programmed all this for you, it's yours to use; think of the likely (most frequent) scenarios, enumerate all possible actions you want to perform on your data, and decide which one works better for you. Think in benefits, not fashion.

Again, if you're storing JPG pictures and only ever look for them by one key (their id maybe?), a well-thought filesystem storage is better. Filesystems, btw, are close to databases today, as many of them use a balanced tree approach, so on a BTRFS you can just put all your pictures in one folder - and the OS will silently implement something like an early SQL query each time you access your files.

So, database or files?...
Let's see a few typical examples when one is better than the other. (These are no complete lists, surely you can stuff in a lot more on both sides.)

DB tables are much better when:

  • You want to store many rows with the exact same structure (no block waste)
  • You need lightning-fast lookup / sorting by more than one value (indexed tables)
  • You need atomic transactions (data safety)
  • Your users will read/write the same data all the time (better locking)

Filesystem is way better if:

  • You like to use version control on your data (a nightmare with dbs)
  • You have big chunks of data that grow frequently (typically, logfiles)
  • You want other apps to access your data without API (like text editors)
  • You want to store lots of binary content (pictures or mp3s)

TL;DR

Programming rarely says "never" or "always". Those who say "database always wins" or "files always win" probably just don't know enough. Think of the possible actions (now + future), consider both ways, and choose the fastest / most efficient for the case. That's it.

Upvotes: 49

zupa
zupa

Reputation: 13442

Context: I've written a filesystem that has been running in production for 7 years now. [1]

The key difference between a filesystem and a database is that the filesystem API is part of the OS, thus filesystem implementations have to implement that API and thus follow certain rules, whereas databases are built by 3rd parties having complete freedom.

Historically, databases where created when the filesystem provided by the OS were not good enough for the problem at hand. Just think about it: if you had special requirements, you couldn't just call Microsoft or Apple to redesign their filesystem API. You would either go ahead and write your own storage software or you would look around for existing alternatives. So the need created a market for 3rd party data storage software which ended up being called databases. That's about it.

While it may seem that filesystems have certain rules like having files and directories, this is not true. The biggest operating systems work like that but there are many mall small OSs that work differently. It's certainly not a hard requirement. (Just remember, to build a new filesystem, you also need to write a new OS, which will make adoption quite a bit harder. Why not focus on just the storage engine and call it a database instead?)

In the end, both databases and filesystems come in all shapes and sizes. Transactional, relational, hierarchical, graph, tabled; whatever you can think of.

[1] I've worked on the Boomla Filesystem which is the storage system behind the Boomla OS & Web Application Platform.

Upvotes: 12

rashedcs
rashedcs

Reputation: 3725

The difference between file processing system and database management system is as follow:

  1. A file processing system is a collection of programs that store and manage files in computer hard-disk. On the other hand, A database management system is collection of programs that enables to create and maintain a database.

  2. File processing system has more data redundancy, less data redundancy in dbms.

  3. File processing system provides less flexibility in accessing data, whereas dbms has more flexibility in accessing data.
  4. File processing system does not provide data consistency, whereas dbms provides data consistency through normalization.
  5. File processing system is less complex, whereas dbms is more complex.

Upvotes: 10

Antony
Antony

Reputation: 4364

Something one should be aware of is that Unix has what is called an inode limit. If you are storing millions of records then this can be a serious problem. You should run df -i to view the % used as effectively this is a filesystem file limit - EVEN IF you have plenty of disk space.

Upvotes: 24

Related Questions