Reputation: 394
I have a task to create a metadata table for my timeseries cassandra db. This metadata table would like to store over 500 pdf files. Each pdf file comprises of 5-10 MB data.
I have thought of storing them as Blobs. Is cassandra able to do that?
Upvotes: 0
Views: 2103
Reputation: 3266
Cassandra isn't a perfect for such blobs and at least datastax recommends to keep them smaller than 1MB for best performance.
But - just try for your self and do some testing. Problems arise when partitions become larger and there are updates in them so the coordinator has much work to do in joining them.
A simple way to go is, store your blob separate as uuid key-value pair in its own table and only store the uuid with your data. When the blob is updated - insert a new one with a new uuid and update your records. With this trick you never have different (and maybe large) versions of your blob and will not suffer that much from performance. I think I read that Walmart did this successfully with images that were partly about 10MB as well as smaller ones.
Just try it out - if you have Cassandra already.
If not you might have a look at Ceph or something similar - but that needs it's own deployment.
Upvotes: 1
Reputation: 91
You can serialize the file and store them as blob. The cost is deserialization when reading the file back. There are many efficient serialization/deserialization libraries that do this efficiently. Another way is to do what @jasim waheed suggested. However, that will result in network io. So you can decide where you want to pay the cost.
Upvotes: 0