Reputation: 3507
Can someone spell out the differences between using the Spark SQL CLI vs. Thriftserver/Beeline to query/modify data in Hive ? The Spark SQL documentation mentions both of them but when would you use one or the other or are they equivalent alternatives from a functional point of view ?
Upvotes: 2
Views: 1262
Reputation: 21830
For clarification:
spark-sql is a program that runs a single instance of Spark and you interact with it as if it were a mysql-like shell prompt and it makes use of the spark-warehouse and those types of features
Spark with Thriftserver is an application that exposes a connection to a running instance of Spark over a JDBC connection. https://community.hortonworks.com/questions/33715/why-do-we-need-to-setup-spark-thrift-server.html
Beeline is a query / consumer tool that one uses to consume / connect to a running JDBC hive2 table (and thus in the spark documentation, they use beeline to test that the JDBC connection is in fact working). Note: query / connector programs like SQL Workbench can be made to connect to Spark with Thriftserver if it imports the proper Hive2 JDBC drivers & jars
Upvotes: 2