Reputation: 519
What effort is required to replace HDFS in hadoop with other NoSQL database. How much work is involved in that? Does anyone have any good wiki or links describing it? Is it as simple as implementing the FileSystem interface for that DB?
I found couple of articles about how other ppl have modified hadoop to generate custom distributions but haven't found a guide to replacing HDFS.
Thanks, Parth
Upvotes: 2
Views: 2484
Reputation: 41428
I did that actually not so long ago, because there were disk space constraints on HDFS which was constraining our backups and storage strategy, so we discussed using S3N as a replacement for HDFS, and it looks like it's a pretty standard operation.
You need to add the following properties in the hadoop-site.xml or hdfs-site.xml :
<property>
<name>fs.default.name</name>
<value>s3://BUCKET</value>
</property>
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>ID</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>SECRET</value>
</property>
You can find more details on setting this up here. One interesting thing to note, since the data is in this case stored on Amazon S3, it needs to be fetched since it is not local anymore, but the impact on performance doesn't seem as significant as I would have initially feared.
Something that I haven't tried but you should definitely look at for alternatives to HDFS is QFS fron Quantcast which I've been hearing some good things about, and benchmarks seem to make it faster than HDFS.
Upvotes: 2
Reputation: 8088
It is relatively simple to implement your own DFS interface and make it work with hadoop. All you need is some kind of logical mapping between file system concepts of file and directory and Your storage.
In case of NoSQL (if I assume KeyValue) you should decide how to represent directories. You can do some special nodes or you can put path into key.
Another decision point - decide if you care about data locality
Regarding the documentation I think sources of s3n DFS implementation is best point to start with.
I think closes example is Hadoop over Cassandra done by DataStax http://www.datastax.com/
Another example (something we done recenly) is hadoop integration with OpenStack Swift. http://bigdatacraft.com/archives/349
Upvotes: 2