Keith
Keith

Reputation: 178

How to set the execution engine to spark when accessing Cloudera Hive via JDBC

I cannot set the execution engine for hive in a script executed via jdbc. When the same script is execute via hue web front end the script will take note that i am trying to set the execution engine to spark but not via jdbc

List<String> result = hiveTemplate.query(script);

Example of the script

set hive.execution.engine=spark;

SELECT * from ......

I have tried executing an actual script in the classpath, i have also tried to send a string representing the sql script via jdbc as noted above.

I have also tried to include the following in the datasource connectionProperties with a factory class that create the hiveTemplate:

public static HiveTemplate createHiveTemplate(HiveExecutionEngine engine) {

    Properties props=new Properties();

    switch (engine) {
        case MAP_REDUCE:
            props.setProperty("hive.execution.engine", "mr");
            props.setProperty("mapreduce.map.memory.mb", "16000");
            props.setProperty("mapreduce.map.java.opts", "Xmx7200m");
            props.setProperty("mapreduce.reduce.memory.mb", "16000");
            props.setProperty("mapreduce.reduce.java.opts", "Xmx7200m");
            break;
        case SPARK:
            props.setProperty("hive.execution.engine", "spark");
            break;
        default:
            throw new NotImplementedException();
    }

    datasource.setConnectionProperties(props);
    return new HiveTemplate(() -> {
        return new HiveClient(datasource);
    });
}

The flowing link shows the documentation to set the execution engine: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

set hive.execution.engine=spark;

I would expect the script to be executed via the spark engine in yarn and not using map reduce which is what is happening. I can confirm that the wrong engine is being applied by looking at the error message and viewing the job history via Cloudera Manager

Has anybody successfully managed to execute a hiveql script via jdbc to use the spark engine?

Upvotes: 1

Views: 1379

Answers (1)

Keith
Keith

Reputation: 178

An update on this question:

it seems to work if you add the configuration properties to the driver url

jdbc:hive2://<ip>:10000?hive.execution.engine=spark

This is still not the ideal state but a solution:

If anybody has done this via java config rather i would be happy to hear from you

Upvotes: 2

Related Questions