Reputation: 11
Which NoSql database can be used to store pdf, text file, word doc, ppt etc? Can anyone please give some suggestions. Is it advisable to use cassandra for the purpose.
Upvotes: 1
Views: 5341
Reputation: 3514
The best NoSQL database to store documents and search them is a dedicated search server, optionally backed by a separate storage solution. There are two search options available: Solr and ElasticSearch. For simple cases, you don't need to have a separate storage backend for them; they act as NoSql store on their own. If built-in (local filesystem / HDFS if on Hadoop) is not appropriate for your needs, you can offload actual data to a separate storage solution.
Pretty much any document-oriented or kv-based NoSQL database can store BLOBs, meaning that you'll have no problems storing random document files in any of them. So the question is how well a particular store fits your usage needs and how well it integrates with the search solution you're considering. Based on a cursory look, there is some level of existing Solr integration for common options such as Cassandra, MongoDB, HBase, Riak etc. ElasticSearch seems to have less support in some cases.
As far as Cassandra in particular goes, there is a product out there that integrates with Solr, and it's called Solandra. It's an older project that is no longer actively developed, but people have been using it in production successfully. If you need more advanced capabilities, or if you run into compatibility issues, there is also DataStax Enterprise, a commercial product that developed from Solandra. Meanwhile, there is still no Cassandra + ElasticSearch out-of-the-box integration project that I'm aware of.
Upvotes: 4