Elijah
Elijah

Reputation: 13604

Can I provide a NativeFileSystemStore to Hadoop as a plugin?

I want to add S3-like support to Hadoop for a different object store that doesn't currently have support in Hadoop. I can't figure out if Hadoop has a plugin model for native filesystems or not.

Is it is simple as implementing the NativeFileSystemStore interfaces and creating a JAR that can be loaded with Hadoop? Is there more to this?

Upvotes: 0

Views: 328

Answers (1)

rav
rav

Reputation: 3688

It’s made relatively simple - Hadoop is using reflection, configuration and/or services for custom FS.

In case of configuration, in core-site.xml if you/user define:

<property>
  <name>fs.<schema>.impl</name>
  <value>me.elijah.AwesomeFS</value>
  <description>The FileSystem for <schema> uris.</description>
</property>
<property>
  <name>fs.AbstractFileSystem.<schema>.impl</name>
  <value>me.elijah.AwesomeAbstractFS</value>
  <description>The AbstractFileSystem for <schema> for Hadoop 2.x only</description>
</property>

please, note the <schema> part, this is where you define your schema part of URI - like for example: hdfs, file, local, s3, gs … Filesystem agnostic part of Hadoop whenever encountered with URI will parse the schema, and fetch the proper class/implementation via configuration and reflection.

And those custom filesystem classes should be available in classpath via your jar, that is really all you might want to do to integrate your custom filesystem. Of course those classes, have to implement certain interfaces:

  • me.elijah.AwesomeFS extends org.apache.hadoop.fs.FileSystem
  • me.elijah.AwesomeAbstractFS extends org.apache.hadoop.fs.AbstractFileSystem

You need me.elijah.AwesomeAbstractFS if you want to use Hadoop 2.x/YARN.

If you want to automatically register your filesystem you might want to publish it via service file (example). Also if you add your own service file and happen to produce assembly jars, make sure to use sane merge policies (you don’t want to lose some of the services) - the one that worked best for me, was to filter distinct lines or merge org.apache.hadoop.fs.FileSystem service files (example)

Upvotes: 1

Related Questions