Replacing HDFS Hadoop

Question

What effort is required to replace HDFS in hadoop with other NoSQL database. How much work is involved in that? Does anyone have any good wiki or links describing it? Is it as simple as implementing the FileSystem interface for that DB?

I found couple of articles about how other ppl have modified hadoop to generate custom distributions but haven't found a guide to replacing HDFS.

Thanks, Parth

David Gruzman · Accepted Answer

It is relatively simple to implement your own DFS interface and make it work with hadoop. All you need is some kind of logical mapping between file system concepts of file and directory and Your storage.
In case of NoSQL (if I assume KeyValue) you should decide how to represent directories. You can do some special nodes or you can put path into key.
Another decision point - decide if you care about data locality
Regarding the documentation I think sources of s3n DFS implementation is best point to start with.
I think closes example is Hadoop over Cassandra done by DataStax http://www.datastax.com/
Another example (something we done recenly) is hadoop integration with OpenStack Swift. http://bigdatacraft.com/archives/349

Replacing HDFS Hadoop

Answers (2)

Related Questions