Reputation: 8316
I am making a model in which i have a FileField
. I want to store the file content in a database column, instead of file path. Any suggestions?
Upvotes: 12
Views: 9500
Reputation: 17206
This solution it not for most projects. Use django-storages if you're in the cloud, otherwise stick with Django's built in FileStorage. This solution takes some work.
I wanted a way to ship and keep absolutely up-to-date everything in a clean (and ultimately transaction accurate) fashion... Usually for small sites this is as simple as git commits, or rsync, but for large production and given the nature of the data (undisclosed) we didn't love the out discrepancies between the filesystem and database.
This PostgreSQL only, and makes use of Large Objects, to avoid individual table bloat.
Another approach would be to use BinaryFields the models, but there is a b64 encode / decode cpu time penalty and the stored values are about 3x that of their normal base 256 size with it.
Postgres has it's own TOAST mechanism aka (The Oversized-Attribute Storage Technique), however that data is still in the original table, albeit in multiple "physical" rows.
A 3rd approach could have been another model to store the data with access via cursors, using the resulting TOAST format.
This has caveats:
There isn't an included view to download any of the stored files. Also you would not get proper mime types, or http-resume if you were to write one without additional work
The files are not stored statically, so they won't appear in the typical static or media folders
There is a database wide maximum of (2^32)-1 Large Objects, so keep that in mind
from django.core.files.storage import Storage
from django.core.files.base import File
from django.db import connection
from django.db import transaction
from django.utils.deconstruct import deconstructible
@deconstructible
class PostgreSQLLargeObjectStorage(Storage):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Ensure the mapping table exists
self._ensure_mapping_table()
def _ensure_mapping_table(self):
with connection.cursor() as cursor:
cursor.execute('''
CREATE TABLE IF NOT EXISTS storage_largeobject (
name VARCHAR PRIMARY KEY,
oid OID,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
)
''')
def _get_oid(self, name):
with connection.cursor() as cursor:
cursor.execute('SELECT oid FROM storage_largeobject WHERE name = %s', [name])
row = cursor.fetchone()
return row[0] if row else None
def _open(self, name, mode='rb'):
oid = self._get_oid(name)
if oid is None:
raise FileNotFoundError(f"No such file: '{name}'")
return PostgreSQLLargeObjectFile(oid, mode)
def _save(self, name, content):
with connection.cursor() as cursor:
# Create a new Large Object
lo = connection.connection.lobject(0, 'wb')
oid = lo.oid
# Write content to the Large Object in chunks
while True:
chunk = content.read(65536)
if not chunk:
break
lo.write(chunk)
lo.close()
# Insert or update the mapping
cursor.execute('''
INSERT INTO storage_largeobject (name, oid, created_at, updated_at)
VALUES (%s, %s, NOW(), NOW())
ON CONFLICT (name) DO UPDATE SET oid = EXCLUDED.oid, updated_at = NOW()
''', [name, oid])
return name
def delete(self, name):
oid = self._get_oid(name)
if oid is None:
return
with connection.cursor() as cursor:
cursor.execute('SELECT lo_unlink(%s)', [oid])
cursor.execute('DELETE FROM storage_largeobject WHERE name = %s', [name])
def exists(self, name):
return self._get_oid(name) is not None
def listdir(self, path):
directories, files = [], []
with connection.cursor() as cursor:
cursor.execute('SELECT name FROM storage_largeobject WHERE name LIKE %s', [f'{path}%'])
rows = cursor.fetchall()
for (name,) in rows:
subpath = name[len(path):]
parts = subpath.lstrip('/').split('/', 1)
if len(parts) == 1 and parts[0]:
files.append(parts[0])
elif len(parts) > 1 and parts[0]:
if parts[0] not in directories:
directories.append(parts[0])
return directories, files
@transaction.atomic
def size(self, name):
oid = self._get_oid(name)
if oid is None:
raise FileNotFoundError(f"No such file: '{name}'")
with connection.cursor() as cursor:
lo = connection.connection.lobject(oid, 'rb')
lo.seek(0, 2) # Seek to the end
size = lo.tell()
lo.close()
return size
def url(self, name):
# Not implemented since it's database storage... A view could return the content instead.
return name;
# return None # This will hide the name in the admin, which we don't want
def accessed_time(self, name):
with connection.cursor() as cursor:
cursor.execute('SELECT updated_at FROM storage_largeobject WHERE name = %s', [name])
row = cursor.fetchone()
if row:
return row[0]
else:
raise FileNotFoundError(f"No such file: '{name}'")
def created_time(self, name):
with connection.cursor() as cursor:
cursor.execute('SELECT created_at FROM storage_largeobject WHERE name = %s', [name])
row = cursor.fetchone()
if row:
return row[0]
else:
raise FileNotFoundError(f"No such file: '{name}'")
def modified_time(self, name):
return self.accessed_time(name)
from django.core.files.base import File
from django.db import connection, transaction
class PostgreSQLLargeObjectFile(File):
def __init__(self, oid, name, mode='rb'):
self.name = name # The name of the file
self.oid = oid # The OID of the large object
self.mode = mode
self._is_dirty = False
self._lo = None
self._opened = False
self._size = None # Cache the size of the file
@transaction.atomic
def open(self, mode=None):
if not self._opened:
if mode:
self.mode = mode
self._opened = True
# Open the large object
self._lo = connection.connection.lobject(self.oid, self.mode)
@transaction.atomic
def close(self):
if self._opened:
if self._lo is not None:
self._lo.close()
self._lo = None
# Commit or rollback the transaction
self._atomic.__exit__(None, None, None)
self._opened = False
@transaction.atomic
@property
def size(self):
if self._size is None:
# Save the current position
pos = self.tell()
# Seek to the end to get the size
self.seek(0, 2)
self._size = self.tell()
# Return to the original position
self.seek(pos)
return self._size
@property
def closed(self):
return not self._opened
@transaction.atomic
def read(self, size=-1):
self.open()
return self._lo.read(size)
@transaction.atomic
def write(self, data):
self.open()
self._is_dirty = True
return self._lo.write(data)
@transaction.atomic
def seek(self, offset, whence=0):
self.open()
return self._lo.seek(offset, whence)
@transaction.atomic
def tell(self):
self.open()
return self._lo.tell()
@transaction.atomic
def flush(self):
if self._is_dirty:
self._is_dirty = False
You can use it with something such as the following
class TrackingFileDescriptor(FileDescriptor):
def __set__(self, instance, value):
# Get the current value of the file field
current_value = instance.__dict__.get(self.field.name)
current_name = None if not current_value else current_value.name
new_name = value
if current_name != new_name:
if current_name is not None and len(current_name) > 0:
instance.file_will_change(self.field.name, current_name, value)
super().__set__(instance, value)
class TrackingFileField(FileField):
def contribute_to_class(self, cls, name, **kwargs):
super().contribute_to_class(cls, name, **kwargs)
setattr(cls, self.name, TrackingFileDescriptor(self))
class TestModel(models.Model):
file = TrackingFileField(blank=True, null=True)
def file_will_change(self, field_name, old_value, new_value):
print(f"The file '{field_name}' is about to be cleared, or change from '{old_value}' to '{new_value}'")
if old_value is not None:
self.file.delete()
@receiver(post_delete, sender=TestModel)
def delete_file(sender, instance, **kwargs):
if instance.file: instance.file.delete(save=False)
And settings.py
# Database file storage
DEFAULT_FILE_STORAGE = 'liberty.storage.postgres.PostgreSQLLargeObjectStorage'
Upvotes: 0
Reputation: 64739
I know this is an old question, but there's been some good code written since then to allow this option. Specially, see django-database-files, which will use Django's storage API to make all FileFields and ImageFields store their contents in the database. There's also a fork to cache the files locally on the filesystem, to overcome the biggest caveat of using the database, which is latency.
Upvotes: 11
Reputation: 11079
Disregard the naysayers. If you want to have full control over your content, put the files in a blob field in the database. I generally also keep the filename in a separate field, so I can reconstruct the file as necessary (that way you keep the extension, which ties it to a file type in most operating systems).
Be sure to store the actual blob data in a separate table, only connected to your filename / extra info table via an id ... that way you don't sacrifice any performance when dealing with any information related to the file other than the content itself.
What the naysayers fail to realize, is that databases are simply an extremely optimized form of file system. Bytes are bytes and disc sectors are disc sectors. Databases are simply much better at organizing and searching those bytes than file systems are. Not to mention, databases implement much more stringent security than most file systems and are better maintained (with backups, support staff etc.).
Upvotes: 24
Reputation: 8316
it is very easy
just overide save method in admin
filecontent=form.cleaned_data.get('upload_file')
data =filecontent.read()
from django.db import connection
cursor = connection.cursor()
cursor.execute("update filecontent set filecontent=(%s) where id=(%s)",[data,obj.id])
connection.connection.commit()
cursor.close()
connection.close()
this will store filecontent in db column filecontent of table filecontent
Upvotes: -1
Reputation: 4870
Well, how about simply storing it in a Binary column? You can then store collection of bytes. And if the filename is important to you as well, you can store that in an additional name column.
Upvotes: 1