Amitabh Ranjan
Amitabh Ranjan

Reputation: 1500

Compatibility issue between Hbase 0.94.2 and apache nutch dependency

I am trying to install apache nutch 2.2.1 and have successfully build it after making the required changes in the configuration files by following http://www.blogjava.net/paulwong/archive/2013/08/31/403513.html tutorial. But even after building it I am not able to crawl anything and after hours of inspection I realized that the hbase version on my company cluster is Hbase- 0.94.2 whereas the installation dependency for apache nutch 2.2.1 is HBase 0.90.4. As hbase-0.90.4.jar is not compatible with Hbase- 0.94.2 I am getting the following error when I try to inject the url into nutch. Kindly help me in changing the dependency of the apache nutch or fixing the error.

I am posting the error below.

Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: Not a host:port pair: �[email protected]�$3�¿½bt13acl1node26.comp.com,60000,1401268790838 at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:127) at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102) at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161) ... 12 more

Caused by: java.lang.IllegalArgumentException: Not a host:port pair: �[email protected]�$3�¿½bt13acl1node26.comp.com,60000,1401268790838 at org.apache.hadoop.hbase.HServerAddress.(HServerAddress.java:60) at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:63) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:354) at org.apache.hadoop.hbase.client.HBaseAdmin.(HBaseAdmin.java:94) at org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:109) ... 14 more

Upvotes: 1

Views: 475

Answers (1)

Slava Dobromyslov
Slava Dobromyslov

Reputation: 3269

You installed Apache Nutch 2.2.1 it uses Apache Gora 0.3 which only supports old Apache HBase 0.90.x as it stated in the official docs.

Anyway you still can use Nutch 2.2.x using the following workaround:

  1. Clone, configure and build fresh Nutch from the official Git branch 2.x as it migrated to Gora 0.4 which compatible with Apache HBase 0.94.x.

  2. Clone and build my version of Apache HBase 0.94.24-hadoop-2.5.0 to use it with the latest Apache Hadoop 2.5.0.

Similar issue was created for Apache Gora 0.3 project. They don't plan to upgrade Apache HBase dependency to the fresh one in the nearest future.

You can also read compatibility documentation for Apache HBase to figure out how to build your own version for any Hadoop release.

Apache Nutch was tested and works well with the following stack:

  • Apache Nutch from 2.x git branch which uses Gora 0.4;
  • Apache Hbase 0.94.24-hadoop-2.5.0;
  • Apache Hadoop 2.5.0.

Upvotes: 1

Related Questions