Reputation: 13604
I want to add S3-like support to Hadoop for a different object store that doesn't currently have support in Hadoop. I can't figure out if Hadoop has a plugin model for native filesystems or not.
Is it is simple as implementing the NativeFileSystemStore interfaces and creating a JAR that can be loaded with Hadoop? Is there more to this?
Upvotes: 0
Views: 328
Reputation: 3688
It’s made relatively simple - Hadoop is using reflection, configuration and/or services for custom FS.
In case of configuration, in core-site.xml
if you/user define:
<property>
<name>fs.<schema>.impl</name>
<value>me.elijah.AwesomeFS</value>
<description>The FileSystem for <schema> uris.</description>
</property>
<property>
<name>fs.AbstractFileSystem.<schema>.impl</name>
<value>me.elijah.AwesomeAbstractFS</value>
<description>The AbstractFileSystem for <schema> for Hadoop 2.x only</description>
</property>
please, note the <schema>
part, this is where you define your schema part of URI - like for example: hdfs
, file
, local
, s3
, gs
… Filesystem agnostic part of Hadoop whenever encountered with URI will parse the schema, and fetch the proper class/implementation via configuration and reflection.
And those custom filesystem classes should be available in classpath via your jar, that is really all you might want to do to integrate your custom filesystem. Of course those classes, have to implement certain interfaces:
me.elijah.AwesomeFS extends org.apache.hadoop.fs.FileSystem
me.elijah.AwesomeAbstractFS extends org.apache.hadoop.fs.AbstractFileSystem
You need me.elijah.AwesomeAbstractFS
if you want to use Hadoop 2.x/YARN.
If you want to automatically register your filesystem you might want to publish it via service file (example). Also if you add your own service file and happen to produce assembly jars, make sure to use sane merge policies (you don’t want to lose some of the services) - the one that worked best for me, was to filter distinct lines or merge org.apache.hadoop.fs.FileSystem
service files (example)
Upvotes: 1