Roshan Vinil
Roshan Vinil

Reputation: 11

Iterating through Resultsets in threads cause Exceptions. ResultSets of queries to Hive using JDBC connections

Doing an evaluation of Hive and its features. Theres a use-case where I need to iterate through a resultset in a separate thread. I can have many ResultSets and spawn a thread to process each one of them. Below is the code I have written for this use case.

public class ConcurrentRSIteration2 {
private static String[] tableNames = 
    { 
        "random_data1",
        "random_data2",
        "random_data3",
        "random_data4"
    };

public static void main(String args[]) throws Exception {
    String driverName = "org.apache.hive.jdbc.HiveDriver";
    Class.forName(driverName);
    Connection con = DriverManager.getConnection(
            "jdbc:hive2://127.0.0.1:10000/default", "hive", "");

    int length = tableNames.length;

    StringBuilder[] sql = new StringBuilder[length];
    PreparedStatement[] stmt = new PreparedStatement[length];
    Thread[] rsIterators = new Thread[length];

    for (int i = 0; i < length; i++) {
        sql[i] = new StringBuilder().
                    append("select * from ").
                    append(tableNames[i]);
        stmt[i] = con.prepareStatement(sql[i].toString());
        RSIterator2 rsIterator = new RSIterator2(stmt[i].executeQuery());

        rsIterators[i] = new Thread(rsIterator);
    }

    for (int i = 0; i < length; i++) {
        rsIterators[i].start();
    }
}
}

class RSIterator2 implements Runnable {

private ResultSet rs;

public RSIterator2(ResultSet rs) {
    this.rs = rs;
}

@Override
public void run() {
    try {
        System.out.println(this.hashCode() + " : " + rs);
        System.out.println(this.hashCode() + " : RS iteration started.");
        int i = 0;
        while (rs.next()) {
            i++;
        }
        System.out.println(this.hashCode() + " : RS iteration done.");
    } catch (SQLException e) {
        e.printStackTrace();
    } finally {
        try {
            rs.close();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }
}

}

Below is the stacktrace of the exception.

org.apache.thrift.transport.TTransportException
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
    at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
    at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
    at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
    at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
    at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
    at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
    at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:501)
    at org.apache.hive.service.cli.thrift.TCLIService$Client.FetchResults(TCLIService.java:488)
    at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:360)
    at hivetrial.RSIterator2.run(ConcurrentRSIteration2.java:60)
    at java.lang.Thread.run(Unknown Source)`

I am new to Hive and might have overlooked few things. Trying to understand this Exception.

Upvotes: 0

Views: 1116

Answers (2)

duffymo
duffymo

Reputation: 308982

Classes in java.sql package are not thread safe.

Separating a ResultSet from its companion Statement is a bad idea. You should query, load the rows into an object or data structure, then close both in a finally block in separate try/catch blocks.

I would be remiss if I failed to point out connection pools. Why limit yourself to just one?

Upvotes: 2

user207421
user207421

Reputation: 311018

Your entire approach is founded on a fallacy. You are using a single connection to execute multiple queries. The database server will therefore sequentialize all the data returned, in the order the queries were executed. Using multiple threads to process a single stream doesn't begin to make sense.

You're also never closing the statements or the connection.

Upvotes: 2

Related Questions