Reputation: 837
I read post on quora which tell that Spark Thrift server is related to Apache Thrift which is d binary communication protocol. Spark Thrift server is the interface to Hive, but how does Spark Thrift server use Apache Thrift for communication with Hive via binary protocol/rpc?
Upvotes: 6
Views: 1695
Reputation: 328
You can bring up the Spark thrift Server on AWS EMR using the following command - sudo /usr/lib/spark/sbin/start-thriftserver.sh --master yarn-client
On EMR, the default port for Spark thrift Server is 10001
While using the beeline for spark use the following command on EMR
/usr/lib/spark/bin/beeline -u 'jdbc:hive2://:10001/default' -e "show databases;"
By Default Hive thrift Server is always up and running on EMR but not the Spark thrift Server
You can also connect any application to the Spark thrift Server using ODBC/JDBC and can also monitor the query on EMR Cluster by Clicking the Application Master link for "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2" job on Yarn Resource Manager:8088 on EMR
Upvotes: 0
Reputation: 16086
Spark Thrift Server is a Hive-compatible interface for Spark.
That means, it creates implementation of HiveServer2
, you can connect with beeline
, however almost all the computation will be computed with Spark, not Hive.
In the previous versions, query parser was from Hive. Currently Spark Thrift Server works with Spark query parser.
Apache Thrift is a framework to develop RPC - Remote Procedure Calls - so there are many implementations using Thrift. Also Cassandra used Thrift, now it's replaced with Cassandra native protocol.
So, Apache Thrift is a framework to develop RPCs, Spark Thrift Server is an implementation of Hive protol, but it uses Spark as a computation framework.
For more details, please see this link from @RussS
Upvotes: 4