James Hall
James Hall

Reputation: 6689

Storing Images in DB - Yea or Nay?

So I'm using an app that stores images heavily in the DB. What's your outlook on this? I'm more of a type to store the location in the filesystem, than store it directly in the DB.

What do you think are the pros/cons?

Upvotes: 415

Views: 884426

Answers (30)

Pablo
Pablo

Reputation: 1

I will go for both solution, I mean...I will develop a litle component (EJB) that store the images in a DB plus the path of this image into the server. This DB only will be updated if we have a new image or the original image it's updated. Then I will also store the path in the business DB.

From an application point of view, I will always user the file system (retrieving the path from th business DB) and by this way we will fix the backup issue, and also avoid possible performance issues.

The only weakness is that we will store the same image 2 times...the good point is that the memory is cheap, come on!.

Upvotes: 0

ila
ila

Reputation: 4724

Im my experience I had to manage both situations: images stored in database and images on the file system with path stored in db.

The first solution, images in database, is somewhat "cleaner" as your data access layer will have to deal only with database objects; but this is good only when you have to deal with low numbers.

Obviously database access performance when you deal with binary large objects is degrading, and the database dimensions will grow a lot, causing again performance loss... and normally database space is much more expensive than file system space.

On the other hand having large binary objects stored in file system will cause you to have backup plans that have to consider both database and file system, and this can be an issue for some systems.

Another reason to go for file system is when you have to share your images data (or sounds, video, whatever) with third party access: in this days I'm developing a web app that uses images that have to be accessed from "outside" my web farm in such a way that a database access to retrieve binary data is simply impossible. So sometimes there are also design considerations that will drive you to a choice.

Consider also, when making this choice, if you have to deal with permission and authentication when accessing binary objects: these requisites normally can be solved in an easier way when data are stored in db.

Upvotes: 6

visakh
visakh

Reputation: 2553

If you are on Teradata, then Teradata Developer Exchange has a detailed article on loading and retrieving lobs and blobs..

http://developer.teradata.com/applications/articles/large-objects-part-1-loading

Upvotes: 0

Ray
Ray

Reputation: 46595

This might be a bit of a long shot, but if you're using (or planning on using) SQL Server 2008 I'd recommend having a look at the new FileStream data type.

FileStream solves most of the problems around storing the files in the DB:

  1. The Blobs are actually stored as files in a folder.
  2. The Blobs can be accessed using either a database connection or over the filesystem.
  3. Backups are integrated.
  4. Migration "just works".

However SQL's "Transparent Data Encryption" does not encrypt FileStream objects, so if that is a consideration, you may be better off just storing them as varbinary.

From the MSDN Article:

Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.
FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.

Upvotes: 56

user13550
user13550

Reputation: 71

Here's an interesting white paper on the topic.

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem

The answer is "It depends." Certainly it would depend upon the database server and its approach to blob storage. It also depends on the type of data being stored in blobs, as well as how that data is to be accessed.

Smaller sized files can be efficiently stored and delivered using the database as the storage mechanism. Larger files would probably be best stored using the file system, especially if they will be modified/updated often. (blob fragmentation becomes an issue in regards to performance.)

Here's an additional point to keep in mind. One of the reasons supporting the use of a database to store the blobs is ACID compliance. However, the approach that the testers used in the white paper, (Bulk Logged option of SQL Server,) which doubled SQL Server throughput, effectively changed the 'D' in ACID to a 'd,' as the blob data was not logged with the initial writes for the transaction. Therefore, if full ACID compliance is an important requirement for your system, halve the SQL Server throughput figures for database writes when comparing file I/O to database blob I/O.

Upvotes: 26

TJez
TJez

Reputation: 2029

For a large number of small images, the database might be better.

I had an application with many small thumbnails (2Kb each). When I put them on the filesystem, they each consumed 8kb, due to the filesystem's blocksize. A 400% increase in space!

See this post for more information on block size: What is the block size of the iphone filesystem?

Upvotes: 0

Rad
Rad

Reputation: 8379

As with most issues, it's not as simple as it sounds. There are cases where it would make sense to store the images in the database.

  • You are storing images that are changing dynamically, say invoices and you wanted to get an invoice as it was on 1 Jan 2007?
  • The government wants you to maintain 6 years of history
  • Images stored in the database do not require a different backup strategy. Images stored on filesystem do
  • It is easier to control access to the images if they are in a database. Idle admins can access any folder on disk. It takes a really determined admin to go snooping in a database to extract the images

On the other hand there are problems associated

  • Require additional code to extract and stream the images
  • Latency may be slower than direct file access
  • Heavier load on the database server

Upvotes: 140

Mark Harrison
Mark Harrison

Reputation: 304654

I'm in charge of some applications that manage many TB of images. We've found that storing file paths in the database to be best.

There are a couple of issues:

  • database storage is usually more expensive than file system storage
  • you can super-accelerate file system access with standard off the shelf products
    • for example, many web servers use the operating system's sendfile() system call to asynchronously send a file directly from the file system to the network interface. Images stored in a database don't benefit from this optimization.
  • things like web servers, etc, need no special coding or processing to access images in the file system
  • databases win out where transactional integrity between the image and metadata are important.
    • it is more complex to manage integrity between db metadata and file system data
    • it is difficult (within the context of a web application) to guarantee data has been flushed to disk on the filesystem

Upvotes: 350

mloskot
mloskot

Reputation: 38940

As someone mentioned already, "it depends". If storage in a database is supposed to be a 1-to-1 fancy replacement for filesystem, it may not be quite a best option.

However, if a database backend will provide additional values, not only a serialization and storage of a blob, then it may make a real sense.

You may take a look at WKT Raster which is a project aiming at developing raster support in PostGIS which in turn serves as a geospatial extension for PostgreSQL database system. Idea behind the WKT Raster is not only to define a format for raster serialization and storage (using PostgreSQL system), but, what's much more important than storage, is to specify database-side efficient image processing accessible from SQL. Long story short, the idea is to move the operational weight from client to database backend, so it take places as close to storage itself as possible. The WKT Raster, as PostGIS, is dedicate to applications of specific domain, GIS.

For more complete overview, check the website and presentation (PDF) of the system.

Upvotes: 1

In a previous project i stored images on the filesystem, and that caused a lot of headaches with backups, replication, and the filesystem getting out of sync with the database.

In my latest project i'm storing images in the database, and caching them on the filesystem, and it works really well. I've had no problems so far.

Upvotes: 4

mluebke
mluebke

Reputation: 8819

In places where you MUST guarantee referential integrity and ACID compliance, storing images in the database is required.

You cannot transactionaly guarantee that the image and the meta-data about that image stored in the database refer to the same file. In other words, it is impossible to guarantee that the file on the filesystem is only ever altered at the same time and in the same transaction as the metadata.

Upvotes: 30

shane
shane

Reputation:

I have worked with many digital storage systems and they all store digital objects on the file system. They tend to use a branch approach, so there will be an archive tree on the file system, often starting with year of entry e.g. 2009, subdirectory will be month e.g. 8 for August, next directory will be day e.g. 11 and sometimes they will use hour as well, the file will then be named with the records persistent ID. Using BLOBS has its advantages and I have heard of it being used often in the IT parts of the chemical industry for storing thousands or millions of photographs and diagrams. It can provide more granular security, a single method of backup, potentially better data integrity and improved inter media searching, Oracle has many features for this within the package they used to call Intermedia (I think it is called something else now). The file system can also have granular security provided through a system such as XACML or another XML type security object. See D Space of Fedora Object Store for examples.

Upvotes: 0

SpliFF
SpliFF

Reputation: 39014

I'd almost never store them in the DB. The best approach is usually to store your images in a path controlled by a central configuration variable and name the images according to the DB table and primary key (if possible). This gives you the following advantages:

  • Move your images to another partition or server just by updating the global config.
  • Find the record matching the image by searching on its primary key.
  • Your images are accessable to processing tools like imagemagick.
  • In web-apps your images can be handled by your webserver directly (saving processing).
  • CMS tools and web languages like Coldfusion can handle uploading natively.

Upvotes: 0

Pure.Krome
Pure.Krome

Reputation: 87047

Assumption: Application is web enabled/web based

I'm surprised no one has really mentioned this ... delegate it out to others who are specialists -> use a 3rd party image/file hosting provider.

Store your files on a paid online service like

Another StackOverflow threads talking about this here.

This thread explains why you should use a 3rd party hosting provider.

It's so worth it. They store it efficiently. No bandwith getting uploaded from your servers to client requests, etc.

Upvotes: 11

Salman Arshad
Salman Arshad

Reputation: 272376

Storing an image in the database still means that the image data ends up somewhere in the file system but obscured so that you cannot access it directly.

+ves:

  • database integrity
  • its easy to manage since you don't have to worry about keeping the filesystem in sync when an image is added or deleted

-ves:

  • performance penalty -- a database lookup is usually slower that a filesystem lookup
  • you cannot edit the image directly (crop, resize)

Both methods are common and practiced. Have a look at the advantages and disadvantages. Either way, you'll have to think about how to overcome the disadvantages. Storing in database usually means tweaking database parameters and implement some kind of caching. Using filesystem requires you to find some way of keeping filesystem+database in sync.

Upvotes: 3

wiseguy
wiseguy

Reputation: 149

The problem with storing only filepaths to images in a database is that the database's integrity can no longer be forced.

If the actual image pointed to by the filepath becomes unavailable, the database unwittingly has an integrity error.

Given that the images are the actual data being sought after, and that they can be managed easier (the images won't suddenly disappear) in one integrated database rather than having to interface with some kind of filesystem (if the filesystem is independently accessed, the images MIGHT suddenly "disappear"), I'd go for storing them directly as a BLOB or such.

Upvotes: 20

cherouvim
cherouvim

Reputation: 31928

Database for data

Filesystem for files

Upvotes: 0

Cygwin98
Cygwin98

Reputation: 520

I would go with the file system approach, primarily due to its better flexibility. Consider that if the number of images gets huge, one database may not be able to handle it. With file system, you can simple add more file servers, assuming that you're using NFS or kind.

Another advantage the file system approach has is to be able to do some fancy stuffs, such as you can use Amazon S3 as the primary storage (save the url in the database instead of file path). In case of outages happen to S3, you fall back to your file server (may be another database entry containing the file path). Some voodoo to apply to Apache or whatever web server you're using.

Upvotes: 0

too much php
too much php

Reputation: 91068

I have recently created a PHP/MySQL app which stores PDFs/Word files in a MySQL table (as big as 40MB per file so far).

Pros:

  • Uploaded files are replicated to backup server along with everything else, no separate backup strategy is needed (peace of mind).
  • Setting up the web server is slightly simpler because I don't need to have an uploads/ folder and tell all my applications where it is.
  • I get to use transactions for edits to improve data integrity - I don't have to worry about orphaned and missing files

Cons:

  • mysqldump now takes a looooong time because there is 500MB of file data in one of the tables.
  • Overall not very memory/cpu efficient when compared to filesystem

I'd call my implementation a success, it takes care of backup requirements and simplifies the layout of the project. The performance is fine for the 20-30 people who use the app.

Upvotes: 7

Draemon
Draemon

Reputation: 34721

Something nobody has mentioned is that the DB guarantees atomic actions, transactional integrity and deals with concurrency. Even referentially integrity is out of the window with a filesystem - so how do you know your file names are really still correct?

If you have your images in a file-system and someone is reading the file as you're writing a new version or even deleting the file - what happens?

We use blobs because they're easier to manage (backup, replication, transfer) too. They work well for us.

Upvotes: 22

Craig
Craig

Reputation: 36856

If you are planning a public facing web site then you should not go with either option. Your should use a Content Delivery Network (CDN). There are price, scalability and speed advantages to a CDN when delivering a large amount of static content over the internet.

Upvotes: -1

dan90266
dan90266

Reputation: 1067

We have implemented a document imaging system that stores all it's images in SQL2005 blob fields. There are several hundred GB at the moment and we are seeing excellent response times and little or no performance degradation. In addition, fr regulatory compliance, we have a middleware layer that archives newly posted documents to an optical jukebox system which exposes them as a standard NTFS file system.

We've been very pleased with the results, particularly with respect to:

  1. Ease of Replication and Backup
  2. Ability to easily implement a document versioning system

Upvotes: 13

David Robbins
David Robbins

Reputation: 10046

Images on a file store are the best bet, and supplement this with storing the meta data in a database. From a web server perspective, the fast way to serve stuff up is to point to it directly. If it's in the database - ala Sharepoint - you have the overhead of ADO.Net to pull it out, stream it, etc.

Documentum - while bloated and complicated - has it right in that the files are out on the share and available for you to determine how to store them - disk on the server, SAN, NAS, whatever. The Documentum strategy is to store the files a tree structure by encoding the folders and file names according to their primary key in the DB. The DB becomes the resource for knowing what files are what and for enforcing security. For high volume systems this type of approach is a good way to go.

Also consider this when dealing with metadata: should you ever need to update the attributes of your meta data corpus, the DB is your friend as you can quickly perform the updates with SQL. With other tagging systems you do not have the easy data manipulation tools at hand

Upvotes: -1

siculars
siculars

Reputation:

In my little application I have at least a million files weighing in at about 200GB at last count. All the files are sitting in an XFS file system mounted on a linux server over iscsi. The paths are stored in the database. use some kind of intelligent naming convention for your file paths and file names.

IMHO, use the file system for what it was meant to do - store files. Databases generally do not offer you any advantage over a standard file system in storing binary data.

Upvotes: -1

Randy Orrison
Randy Orrison

Reputation: 1259

In my current application, I'm doing both. When the user identifies an image to attach to a record, I use ImageMagick to resize it to an appropriate size for display on screen (about 300x300 for my application) and store that in the database for ease of access, but then also copy the user's original file to a network share so that it's available for applications that require higher resolution (like printing).

(There are a couple other factors involved as well: Navision will only display BMPs, so when I resize it I also convert to BMP for storage, and the database is replicated to remote sites where it's useful to be able to display the image. Printing is only done at the head office, so I don't need to replicate the original file.)

Upvotes: -1

nikhilbelsare
nikhilbelsare

Reputation: 101

It depends on the number of images you are going to store and also their sizes. I have used databases to store images in the past and my experience has been fairly good.

IMO, Pros of using database to store images are,

A. You don't need FS structure to hold your images
B. Database indexes perform better than FS trees when more number of items are to be stored
C. Smartly tuned database perform good job at caching the query results
D. Backups are simple. It also works well if you have replication set up and content is delivered from a server near to user. In such cases, explicit synchronization is not required.

If your images are going to be small (say < 64k) and the storage engine of your db supports inline (in record) BLOBs, it improves performance further as no indirection is required (Locality of reference is achieved).

Storing images may be a bad idea when you are dealing with small number of huge sized images. Another problem with storing images in db is that, metadata like creation, modification dates must handled by your application.

Upvotes: 7

Tammen
Tammen

Reputation: 4006

I would go with the file system approach. No need to create or maintain a DB with images, it will save you some major headaches in the long run.

Upvotes: 0

Tim Howland
Tim Howland

Reputation: 7970

I prefer to store image paths in the DB and images on the filesystem (with rsync between servers to keep everything reasonably current).

However, some of the content-management-system stuff I do needs the images in the CMS for several reasons- visibility control (so the asset is held back until the press release goes out), versioning, reformatting (some CMS's will dynamically resize for thumbnails )and ease of use for linking the images into the WYSIWYG pages.

So the rule of thumb for me is to always stash application stuff on the filesystem, unless it's CMS driven.

Upvotes: 0

a7drew
a7drew

Reputation: 7811

If you're not on SQL Server 2008 and you have some solid reasons for putting specific image files in the database, then you could take the "both" approach and use the file system as a temporary cache and use the database as the master repository.

For example, your business logic can check if an image file exists on disc before serving it up, retrieving from the database when necessary. This buys you the capability of multiple web servers and fewer sync issues.

Upvotes: 10

dicroce
dicroce

Reputation: 46820

The trick here is to not become a zealot.

One thing to note here is that no one in the pro file system camp has listed a particular file system. Does this mean that everything from FAT16 to ZFS handily beats every database?

No.

The truth is that many databases beat many files systems, even when we're only talking about raw speed.

The correct course of action is to make the right decision for your precise scenario, and to do that, you'll need some numbers and some use case estimates.

Upvotes: 31

Related Questions