Reputation: 8316

Storing file content in DB

I am making a model in which i have a FileField. I want to store the file content in a database column, instead of file path. Any suggestions?

Upvotes: 12

Answers (5)

Kevin Parker

Reputation: 17206

This solution it not for most projects. Use django-storages if you're in the cloud, otherwise stick with Django's built in FileStorage. This solution takes some work.

I wanted a way to ship and keep absolutely up-to-date everything in a clean (and ultimately transaction accurate) fashion... Usually for small sites this is as simple as git commits, or rsync, but for large production and given the nature of the data (undisclosed) we didn't love the out discrepancies between the filesystem and database.

This PostgreSQL only, and makes use of Large Objects, to avoid individual table bloat.

Another approach would be to use BinaryFields the models, but there is a b64 encode / decode cpu time penalty and the stored values are about 3x that of their normal base 256 size with it.

Postgres has it's own TOAST mechanism aka (The Oversized-Attribute Storage Technique), however that data is still in the original table, albeit in multiple "physical" rows.

A 3rd approach could have been another model to store the data with access via cursors, using the resulting TOAST format.

This has caveats:

There isn't an included view to download any of the stored files. Also you would not get proper mime types, or http-resume if you were to write one without additional work
The files are not stored statically, so they won't appear in the typical static or media folders

There is a database wide maximum of (2^32)-1 Large Objects, so keep that in mind

 from django.core.files.storage import Storage
 from django.core.files.base import File
 from django.db import connection
 from django.db import transaction
 from django.utils.deconstruct import deconstructible

 @deconstructible
 class PostgreSQLLargeObjectStorage(Storage):
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)

         # Ensure the mapping table exists
         self._ensure_mapping_table()

     def _ensure_mapping_table(self):
         with connection.cursor() as cursor:
             cursor.execute('''
                 CREATE TABLE IF NOT EXISTS storage_largeobject (
                     name VARCHAR PRIMARY KEY,
                     oid OID,
                     created_at TIMESTAMP NOT NULL DEFAULT NOW(),
                     updated_at TIMESTAMP NOT NULL DEFAULT NOW()
                 )
             ''')

     def _get_oid(self, name):
         with connection.cursor() as cursor:
             cursor.execute('SELECT oid FROM storage_largeobject WHERE name = %s', [name])
             row = cursor.fetchone()
             return row[0] if row else None

     def _open(self, name, mode='rb'):
         oid = self._get_oid(name)
         if oid is None:
             raise FileNotFoundError(f"No such file: '{name}'")
         return PostgreSQLLargeObjectFile(oid, mode)

     def _save(self, name, content):
         with connection.cursor() as cursor:
             # Create a new Large Object
             lo = connection.connection.lobject(0, 'wb')
             oid = lo.oid
             # Write content to the Large Object in chunks
             while True:
                 chunk = content.read(65536)
                 if not chunk:
                     break
                 lo.write(chunk)
             lo.close()
             # Insert or update the mapping
             cursor.execute('''
                 INSERT INTO storage_largeobject (name, oid, created_at, updated_at)
                 VALUES (%s, %s, NOW(), NOW())
                 ON CONFLICT (name) DO UPDATE SET oid = EXCLUDED.oid, updated_at = NOW()
             ''', [name, oid])
         return name

     def delete(self, name):
         oid = self._get_oid(name)
         if oid is None:
             return
         with connection.cursor() as cursor:
             cursor.execute('SELECT lo_unlink(%s)', [oid])
             cursor.execute('DELETE FROM storage_largeobject WHERE name = %s', [name])

     def exists(self, name):
         return self._get_oid(name) is not None

     def listdir(self, path):
         directories, files = [], []
         with connection.cursor() as cursor:
             cursor.execute('SELECT name FROM storage_largeobject WHERE name LIKE %s', [f'{path}%'])
             rows = cursor.fetchall()
             for (name,) in rows:
                 subpath = name[len(path):]
                 parts = subpath.lstrip('/').split('/', 1)
                 if len(parts) == 1 and parts[0]:
                     files.append(parts[0])
                 elif len(parts) > 1 and parts[0]:
                     if parts[0] not in directories:
                         directories.append(parts[0])
         return directories, files

     @transaction.atomic
     def size(self, name):
         oid = self._get_oid(name)
         if oid is None:
             raise FileNotFoundError(f"No such file: '{name}'")
         with connection.cursor() as cursor:
             lo = connection.connection.lobject(oid, 'rb')
             lo.seek(0, 2)  # Seek to the end
             size = lo.tell()
             lo.close()
         return size

     def url(self, name):
         # Not implemented since it's database storage... A view could return the content instead.
         return name;
         # return None # This will hide the name in the admin, which we don't want

     def accessed_time(self, name):
         with connection.cursor() as cursor:
             cursor.execute('SELECT updated_at FROM storage_largeobject WHERE name = %s', [name])
             row = cursor.fetchone()
             if row:
                 return row[0]
             else:
                 raise FileNotFoundError(f"No such file: '{name}'")

     def created_time(self, name):
         with connection.cursor() as cursor:
             cursor.execute('SELECT created_at FROM storage_largeobject WHERE name = %s', [name])
             row = cursor.fetchone()
             if row:
                 return row[0]
             else:
                 raise FileNotFoundError(f"No such file: '{name}'")

     def modified_time(self, name):
         return self.accessed_time(name)

 from django.core.files.base import File
 from django.db import connection, transaction

 class PostgreSQLLargeObjectFile(File):
     def __init__(self, oid, name, mode='rb'):
         self.name = name      # The name of the file
         self.oid = oid        # The OID of the large object
         self.mode = mode
         self._is_dirty = False
         self._lo = None
         self._opened = False
         self._size = None     # Cache the size of the file

     @transaction.atomic
     def open(self, mode=None):
         if not self._opened:
             if mode:
                 self.mode = mode

             self._opened = True
             # Open the large object
             self._lo = connection.connection.lobject(self.oid, self.mode)

     @transaction.atomic
     def close(self):
         if self._opened:
             if self._lo is not None:
                 self._lo.close()
                 self._lo = None
             # Commit or rollback the transaction
             self._atomic.__exit__(None, None, None)
             self._opened = False

     @transaction.atomic
     @property
     def size(self):
         if self._size is None:
             # Save the current position
             pos = self.tell()
             # Seek to the end to get the size
             self.seek(0, 2)
             self._size = self.tell()
             # Return to the original position
             self.seek(pos)
         return self._size

     @property
     def closed(self):
         return not self._opened

     @transaction.atomic
     def read(self, size=-1):
         self.open()
         return self._lo.read(size)

     @transaction.atomic
     def write(self, data):
         self.open()
         self._is_dirty = True
         return self._lo.write(data)

     @transaction.atomic
     def seek(self, offset, whence=0):
         self.open()
         return self._lo.seek(offset, whence)

     @transaction.atomic
     def tell(self):
         self.open()
         return self._lo.tell()

     @transaction.atomic
     def flush(self):
         if self._is_dirty:
             self._is_dirty = False

You can use it with something such as the following

class TrackingFileDescriptor(FileDescriptor):
    def __set__(self, instance, value):
        # Get the current value of the file field
        current_value = instance.__dict__.get(self.field.name)
        current_name = None if not current_value else current_value.name
        new_name = value

        if current_name != new_name:
            if current_name is not None and len(current_name) > 0:
                instance.file_will_change(self.field.name, current_name, value)

        super().__set__(instance, value)

class TrackingFileField(FileField):
    def contribute_to_class(self, cls, name, **kwargs):
        super().contribute_to_class(cls, name, **kwargs)
        setattr(cls, self.name, TrackingFileDescriptor(self))

class TestModel(models.Model):
    file = TrackingFileField(blank=True, null=True)

    def file_will_change(self, field_name, old_value, new_value):
        print(f"The file '{field_name}' is about to be cleared, or change from '{old_value}' to '{new_value}'")
        if old_value is not None:
            self.file.delete()


@receiver(post_delete, sender=TestModel)
def delete_file(sender, instance, **kwargs):
    if instance.file: instance.file.delete(save=False)

And settings.py

# Database file storage
DEFAULT_FILE_STORAGE = 'liberty.storage.postgres.PostgreSQLLargeObjectStorage'

Upvotes: 0

Cerin

Reputation: 64739

I know this is an old question, but there's been some good code written since then to allow this option. Specially, see django-database-files, which will use Django's storage API to make all FileFields and ImageFields store their contents in the database. There's also a fork to cache the files locally on the filesystem, to overcome the biggest caveat of using the database, which is latency.

Upvotes: 11

Ron Savage

Reputation: 11079

Disregard the naysayers. If you want to have full control over your content, put the files in a blob field in the database. I generally also keep the filename in a separate field, so I can reconstruct the file as necessary (that way you keep the extension, which ties it to a file type in most operating systems).

Be sure to store the actual blob data in a separate table, only connected to your filename / extra info table via an id ... that way you don't sacrifice any performance when dealing with any information related to the file other than the content itself.

What the naysayers fail to realize, is that databases are simply an extremely optimized form of file system. Bytes are bytes and disc sectors are disc sectors. Databases are simply much better at organizing and searching those bytes than file systems are. Not to mention, databases implement much more stringent security than most file systems and are better maintained (with backups, support staff etc.).

Upvotes: 24

ha22109

Reputation: 8316

it is very easy

just overide save method in admin

filecontent=form.cleaned_data.get('upload_file')  
data =filecontent.read()
from django.db import connection
cursor = connection.cursor()
cursor.execute("update filecontent set filecontent=(%s) where id=(%s)",[data,obj.id])
connection.connection.commit()
cursor.close()
connection.close()

this will store filecontent in db column filecontent of table filecontent

Upvotes: -1

Pascal Lindelauf

Reputation: 4870

Well, how about simply storing it in a Binary column? You can then store collection of bytes. And if the filename is important to you as well, you can store that in an additional name column.

Upvotes: 1

Storing file content in DB

Answers (5)

Related Questions