Reena Upadhyay
Reena Upadhyay

Reputation: 2017

Spark 1.5.1 not working with hive jdbc 1.2.0

I am trying to execute hive query using spark 1.5.1 in standalone mode and hive 1.2.0 jdbc version.

Here is my piece of code:

private static final String HIVE_DRIVER = "org.apache.hive.jdbc.HiveDriver";
private static final String HIVE_CONNECTION_URL = "jdbc:hive2://localhost:10000/idw";
private static final SparkConf sparkconf = new SparkConf().set("spark.master", "spark://impetus-i0248u:7077").set("spark.app.name", "sparkhivesqltest")
                .set("spark.cores.max", "1").set("spark.executor.memory", "512m");

private static final JavaSparkContext sc = new JavaSparkContext(sparkconf);
private static final SQLContext sqlContext = new SQLContext(sc);
public static void main(String[] args) {                
    //Data source options
    Map<String, String> options = new HashMap<String, String>();
    options.put("driver", HIVE_DRIVER);
    options.put("url", HIVE_CONNECTION_URL);
    options.put("dbtable", "(select * from idw.emp) as employees_name");
    DataFrame jdbcDF =    sqlContext.read().format("jdbc").options(options).load();    
    }

I am getting below error at sqlContext.read().format("jdbc").options(options).load();

Exception in thread "main" java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:143) at

org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:135) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91) at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)

I am running spark 1.5.1 in standalone mode Hadoop version is 2.6 Hive version is 1.2.0

Here is the dependency that I have added in java project in pom.xml

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.5.1</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.5.1</version>
</dependency>

<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>1.2.0</version>
    <exclusions>
    <exclusion>
        <groupId>javax.servlet</groupId>
        <artifactId>servlet-api</artifactId>
    </exclusion>
    </exclusions>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>1.2.0</version>
</dependency>

Can anyone help me out in this? If somebody has used spark 1.5.1 with hive jdbc, then can you please tell me the compatible version of hive for spark 1.5.1.

Thank you in advance..!

Upvotes: 2

Views: 2549

Answers (1)

Dennis Huo
Dennis Huo

Reputation: 10677

As far as I can tell, you're unfortunately out of luck in terms of using the jdbc connector until it's fixed upstream; the "Method not supported" in this case is not just a version mismatch, but is explicitly not implemented in the hive jdbc library branch-1.2 and even if you look at the hive jdbc master branch or branch-2.0 it's still not implemented:

public boolean isSigned(int column) throws SQLException {
  throw new SQLException("Method not supported");
}

Looking at the Spark callsite, isSigned is called during resolveTable in Spark 1.5 as well as at master.

That said, most likely the real reason this "issue" remains is that when interactive with Hive, you're expected to connect to the Hive metastore directly rather than needing to mess around with jdbc connectors; see the Hive Tables in Spark documentation for how to do this. Essentially, you want to think of Spark as an equal/replacement of Hive rather than being a consumer of Hive.

This way, pretty much all you do is add hive-site.xml to your Spark's conf/ directory and make sure the datanucleus jars under lib_managed/jars are available to all Spark executors, and then Spark talks directly to the Hive metastore for schema info and fetches data directly from your HDFS in a way amenable to nicely parallelized RDDs.

Upvotes: 6

Related Questions