Harinder
Harinder

Reputation: 11944

Use Apache Hadoop JAR files or vendor specific?

I am creating an application for Hadoop which should run on all Distributions of Hadoop provided by different vendors like: Cloudera, MapR, Hortonworks, Pivotal...etc. My application would be deployed on application servers like WebLogic, JBoss or can be deployed on tomcat also. So my question here is:- Suppose some version of all these vendors use the same underlying Hadoop version say Hadoop 2.0, so should i use the JAR files given by these vendors or use the JAR files given by Apache hadoop?

I mean the JAR files that have the same classes as Apache hadoop but have their name in them like blablaCDH5.2blabla.jar, so should i use this one or the one from Apache? So i can build a single version for Hadoop 2.0 and use it for all vendors. Can that be done or i have to build different flavours of my app for all vendor distributions.

Thanks in advance

Upvotes: 1

Views: 205

Answers (4)

Sachin Janani
Sachin Janani

Reputation: 1319

You can create a shims layer that allows your application to run with any hadoop distribution.As most of the distribution has different hadoop versions it is very difficult to deal with this problem.So most of the vendors are now creating shims layer that can work with any hadoop distribution.Shim layer has now been implemented in many applications like Pentaho,hive,gora etc.

Upvotes: 2

buckaroo1177125
buckaroo1177125

Reputation: 1683

One approach, which may vary slightly based on your version control and build systems, would be to have separate build scripts using the dependencies from the different distributions.

Where test cases fail for a given distribution you could have a branch/fork for that distribution or, probably less desirable, have a specific build which does some pre-build magic for that distribution.

This way you should be able to maintain a consistent trunk while being able to track and handle issues that come up in vendor/version specific distribution going forward. This would definitely be possible with git and most build systems (e.g gradle, maven or ant).

Upvotes: 2

miljanm
miljanm

Reputation: 936

It depends on how deep into hadoop API you are threading.

If your application only submits jobs to the cluster, you are probably ok with vanilla libraries as long as you stick to one specific version. If you are doing advanced stuff and using hadoop internals, it may be necessary to include vendor specific ones.

Upvotes: 1

phoenix
phoenix

Reputation: 81

Dennis you can build your application using jars provided by Apache Hadoop , because all of them are modified form of Apache hadoop. These all distributions have same baseline structure so using jars provided by Apache hadoop won't create any problem. In fact I am providing you links for cloudera in which they are using jars provided by Apache Hadoop itself.This the required link.

Upvotes: 0

Related Questions