Reputation: 51
BACKSTORY
I maintain and am in the process of re-engineering a couple of PHP based webapplications, and there is one subject i haven't found an elegant solution for yet, so i am looking for some input which might lead me to a better way to do it.
CURRENT STATE
Several of my applications allow the users to store images in addition to lots of data. All the data ends up in a PostgreSQL cluster, however i choose not to store the images themselves in the database for performance and maintainability sake. The images get their metadata stored in the database (such as original filename, width/height, etc) and once the database transaction succeeded, i move the image on the filesystem into an image directory (stored as .jpg).
THE PROBLEM
All of this functions fine, but as the applications are used a lot, and by multiple people simultaniously, and over the internet, and PHP's error/exception handling isn't exactly the most reliable in all scenarios, i am occasionally worried about not being able to wrap storing the image (on the Filesystem) inside the Database's transaction (since its happening on the filesystem). I'm also worried because if an image file gets corrupted/altered/deleted on the filesystem, the database's records will not be properly updated (no referential integrity).
SOLUTIONS
What i've come up with so far is:
Option A) Store the actual image (not just metadata, but the whole binary) in the database. -- I'm no fan of this since currently the database, while being quite complex, is still very small (No more than 60MB orso). The related images total many many GB's so it would increase the footprint of my PostgreSQL installation in a massive way. Additionally, it will complicate my database backup and replication scenarios.
Option B) Keep the current design (images on filesystem, data in postgres) and just try to account for corrupt data at the application level at every point where it gets used. -- It makes the application much more complex and errorprone.
Option C) I found a PHP ORM framework called Flourishlib which contains a filesystem class that simulates Filesystem-transactions (basicly if you call $file->rename() it checks if that would be possible, but doesnt actually rename until you commit the transaction) -- This is the best solution i found so far, however i'm already using another ORM framework (Propel) which i like a lot more for a project this size, so i would be requiring 2 frameworks with largely overlapping functionality.
Sooo
So, i'm thinking many other people here will have run into this same "problem" before, and i'm sure some came up with some solutions i haven't thought of yet. Appreciate any pointers, advice or critique.
Upvotes: 5
Views: 1712
Reputation: 41
In my opinion these are two separate problems.
First one: How you guarantee integrety, which you already solved somehow. The only thing i would consider is performing the file system operation during the transaction of the db and rollback if something goes wrong. The tradeof here is performance since filesystem operations are rather slow but not that slow ;) You could try it...
Second one: How you keep the integrety after external file operations. Here i would suggest taking a look at inotofy with php PHPInotify. It allows you to implement an observer pattern in order to get notified when something changes on the filesystem.
Upvotes: 1
Reputation: 529
Here is my suggestion of an option D:
Store the actual image in the database (the whole binary) with its metadata and hash (see What is image hashing used for?).
Build a microservice in charge of converting binary image from your database to your filesystem or CDN. By comparing hashes, this microservice can check image integrity. It can even take care of storing previous versions and logs. Once transaction is done, binary data from the database can be deleted to keep it lightweight.
Designing a message queue architecture (with Amazon SQS for example) to launch and manage this microservice. It will run independently from your main application and will be ready to deal with failures, database maintenance, errors, etc.
Hope this helps, even after 8 years.
Upvotes: 1
Reputation: 613
You can always grab a subset of Flourish from the Advanced Download page. Just select fFile and it will select the dependencies. Unfortunately the automatic dependency detection has grow a little inaccurate over time (so it will include fEmail which is really optional), but you can delete that, leaving you with some filesystem classes and some core/exception stuff.
Upvotes: 0