Reputation: 2765
So many options and so little time to test them all... I wonder if someone has experiences with distributed file systems for video streaming and storage/encoding.
I have a lot of huge video files (50GB to 250GB) that I need to store somewhere, be able to encode them to mp4 and stream them from several Adobe FMS servers. The only way to handle all this is with a distributed file system but now the question is which one??
My research so far tells me:
So far Lustre seems the winner but I would like to hear real experiences for the particular application I have.
Also Hadoop, Redhat GFS, Coda and Windows DFS sound as options so any experiences are welcome. If someone has benchmarks please share.
After some real experience this is what I have learned:
Final conclusion:
Unfortunately the conclusion is "No single silver bullet".
Currently we have our media files in Gluster3.2 in a replicated volume for storage and transcoding. As long as you don't have a lot of servers, avoid geo-replication and stripe volumes things work ok.
When we are going to stream the media files we copy them to a lustre volume that is replicated to a second lustre volume via DR:DB. The wowza server then read the media files from the lustre volumes.
And finally we use MogileFS to serve the thumbnails in our web application servers.
Upvotes: 28
Views: 18387
Reputation: 470
GlusterFS improved themselves a lot up to this date. They are now providing "granular locking" for large files. See here: http://www.gluster.org/community/documentation/index.php/WhatsNew3.3 Also it is quite dependent video frame rates you should work for too. If you will not go up to 4K rates, Gluster can solve the storage problems. If there is a huge demand for speed, therefore Infiniband can come in to play.
Upvotes: 5
Reputation: 41
Map-reduce doesn't help in write/read ratio of 90/10! The constant file size is a good thing and the files are small. So, MogileFS sounds to be good alternative as Luster/Gluster - cache situation is not appropriate.
Upvotes: 1
Reputation: 6953
MogileFS is great for that sort of thing. The client libraries varies a bit in quality, but I'd be surprised if there weren't large-ish scale production sites using just about any language to access it.
HTTP is a good protocol for this stuff actually. Who doesn't have a feature-rich and efficient HTTP client?
Upvotes: 2
Reputation:
Check out Hadoop Filesystem (HDFS). Its focus is on very large files and parallel task computing (with map/reduce), it has a high latency but very high throughput. It is currently used on such large installations as Facebook and amazon.com
Upvotes: 2
Reputation: 61
From the named systems the most suitable is MoglieFS.
But perhaps you can get by w/out any special system at all. Say you have 4 AdobeFMS servers:
{video0.exmple.com,video1.exmple.com,video2.exmple.com,video3.exmple.com}.
You can distribute all your videos among those 4 servers using simple scheme, like
/*
* pseudo code
*/
$server_id = get_server_id(filename);
...
...
int function get_server_id(filename)
{
return hash(filename) mod 4;
}
after you encode videos, your app would
$server_id = get_server_id(file_name)
copy file_name to /mnt/$server_id/
clients will access videos using something like http://videoN.example.com/filename.mp4, where N is calculated from filename using get_server_id()
.
Luster/Gluster is really not what you should be looking for. Luster FS is more mature, but developers ask you to treat files on such FS as "cache", i.e. they can be lost at any time.
Luster/Gluster are targeted for use in HPC to allow fast access for huge amounts of data w/out single storage server being performance bottle-neck. Another point for those systems is that they are POSIX-complaint. In HPC/Scientific research environment you usually do not have a time to waist for rewriting your apps because you installed new cool and fast FS.
Upvotes: 1