Reputation: 11
Doing an evaluation of Hive and its features. Theres a use-case where I need to iterate through a resultset in a separate thread. I can have many ResultSets and spawn a thread to process each one of them. Below is the code I have written for this use case.
public class ConcurrentRSIteration2 {
private static String[] tableNames =
{
"random_data1",
"random_data2",
"random_data3",
"random_data4"
};
public static void main(String args[]) throws Exception {
String driverName = "org.apache.hive.jdbc.HiveDriver";
Class.forName(driverName);
Connection con = DriverManager.getConnection(
"jdbc:hive2://127.0.0.1:10000/default", "hive", "");
int length = tableNames.length;
StringBuilder[] sql = new StringBuilder[length];
PreparedStatement[] stmt = new PreparedStatement[length];
Thread[] rsIterators = new Thread[length];
for (int i = 0; i < length; i++) {
sql[i] = new StringBuilder().
append("select * from ").
append(tableNames[i]);
stmt[i] = con.prepareStatement(sql[i].toString());
RSIterator2 rsIterator = new RSIterator2(stmt[i].executeQuery());
rsIterators[i] = new Thread(rsIterator);
}
for (int i = 0; i < length; i++) {
rsIterators[i].start();
}
}
}
class RSIterator2 implements Runnable {
private ResultSet rs;
public RSIterator2(ResultSet rs) {
this.rs = rs;
}
@Override
public void run() {
try {
System.out.println(this.hashCode() + " : " + rs);
System.out.println(this.hashCode() + " : RS iteration started.");
int i = 0;
while (rs.next()) {
i++;
}
System.out.println(this.hashCode() + " : RS iteration done.");
} catch (SQLException e) {
e.printStackTrace();
} finally {
try {
rs.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
}
Below is the stacktrace of the exception.
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:501)
at org.apache.hive.service.cli.thrift.TCLIService$Client.FetchResults(TCLIService.java:488)
at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:360)
at hivetrial.RSIterator2.run(ConcurrentRSIteration2.java:60)
at java.lang.Thread.run(Unknown Source)`
I am new to Hive and might have overlooked few things. Trying to understand this Exception.
Upvotes: 0
Views: 1116
Reputation: 308982
Classes in java.sql
package are not thread safe.
Separating a ResultSet
from its companion Statement
is a bad idea. You should query, load the rows into an object or data structure, then close both in a finally block in separate try/catch blocks.
I would be remiss if I failed to point out connection pools. Why limit yourself to just one?
Upvotes: 2
Reputation: 311018
Your entire approach is founded on a fallacy. You are using a single connection to execute multiple queries. The database server will therefore sequentialize all the data returned, in the order the queries were executed. Using multiple threads to process a single stream doesn't begin to make sense.
You're also never closing the statements or the connection.
Upvotes: 2