Can I provide a NativeFileSystemStore to Hadoop as a plugin?

Question

I want to add S3-like support to Hadoop for a different object store that doesn't currently have support in Hadoop. I can't figure out if Hadoop has a plugin model for native filesystems or not.

Is it is simple as implementing the NativeFileSystemStore interfaces and creating a JAR that can be loaded with Hadoop? Is there more to this?

rav · Accepted Answer

It’s made relatively simple - Hadoop is using reflection, configuration and/or services for custom FS.

In case of configuration, in core-site.xml if you/user define:


  fs..impl
  me.elijah.AwesomeFS
  The FileSystem for  uris.


  fs.AbstractFileSystem..impl
  me.elijah.AwesomeAbstractFS
  The AbstractFileSystem for  for Hadoop 2.x only

please, note the part, this is where you define your schema part of URI - like for example: hdfs, file, local, s3, gs … Filesystem agnostic part of Hadoop whenever encountered with URI will parse the schema, and fetch the proper class/implementation via configuration and reflection.

And those custom filesystem classes should be available in classpath via your jar, that is really all you might want to do to integrate your custom filesystem. Of course those classes, have to implement certain interfaces:

me.elijah.AwesomeFS extends org.apache.hadoop.fs.FileSystem
me.elijah.AwesomeAbstractFS extends org.apache.hadoop.fs.AbstractFileSystem

You need me.elijah.AwesomeAbstractFS if you want to use Hadoop 2.x/YARN.

If you want to automatically register your filesystem you might want to publish it via service file (example). Also if you add your own service file and happen to produce assembly jars, make sure to use sane merge policies (you don’t want to lose some of the services) - the one that worked best for me, was to filter distinct lines or merge org.apache.hadoop.fs.FileSystem service files (example)

Can I provide a NativeFileSystemStore to Hadoop as a plugin?

Answers (1)

Related Questions